CMSC 421 Operating Systems Lecture Notes (c) 1993, 1996 Howard E. Motteler Distributed Systems (Chapter 9) ================================ Hardware (9.2) --------------- There are a number of ways of classifying systems with more than one processor; for example - shared vs distributed memory - tightly vs loosely coupled systems (for distributed memory) - SIMD vs MIMD (for "parallel processors") Shared Memory ============== Examples: multi-processor Cray & SGI machines Features of a shared memory system: - single common address space (all processors share the same memory) - usually bus based and synchronous (has a global clock) - all inter-process communication is through memory - semaphores or some similar mechanism is needed for synchronization Shared memory advantages: - very cost-effective for moderate numbers of processors (up to 10 or 20) as it's easy to place several processors on a common bus - familiar programming paradigm--threads. (Project 1 used threads, which in general are just a group of processes working together and communicating through shared memory) Shared memory disadvantages: - does not scale well, as the bus bandwidth becomes a bottleneck for large numbers (more than 20 or 30) of processors - shared data can be a bottleneck, as a single memory address can be accessed by only one processor at a time (note that critical sections are always a bottleneck) Shared memory systems normally use a memory cache to improve performance This speeds access time and allows a single address to be read by several processors at the same time Problem: cache coherence - suppose processors P1 and P2 both have variable V in their cache - P1 modifies V - P2 won't see the updated value until P1's cache is written back to main memory P1 P2 | | P1 == V' P2 == V cache == cache == | | Main -------------------------- Memory One solution: - use write-through cache, and (writes are copied back to main memory) - snoop: when a cache sees a write to an address it has, it either updates the value or drops it from the cache Distributed Memory =================== In a distributed memory system - each processor has its own memory - communication can either be bus-based or through some kind of message-passing network (your text calls this a `switched' system) Examples: MasPar, Connection Machines, Intel Paragon; potentially, any network of workstations Some distributed memory machines (e.g., the KSR-1 and Cray T3D) *simulate* shared memory with distributed memory and message passing Loose vs tight coupling ------------------------ Distributed memory systems can be classified by whether they are "loosely" or "tightly coupled" (Shared-memory systems are always tightly coupled) Very generally, - tightly coupled = high bandwidth between processors - loosely coupled = low bandwidth between processors (bandwidth = bytes/second transfer rate) In practice, latency--the time it takes the first part of a message to arrive--is just as important as bandwidth Tightly coupled systems can be used for a broad range of single applications that need to run fast, such as - graphics - linear algebra - neural networks - finite element (and finite grid) models Loosely coupled systems have more limited performance but are fine for many applications, such as - Network File System (NFS) - the Internet - "embarassingly parallel" applications (application that don't do much communication), such as computing images of the Mandelbrot set Network Geometry ----------------- Message passing distributed memory systems can also be classified by their connection geometry: - bus-based (workstations on ethernet) - grids (MasPar X-grid) - hypercubes (CM-2, N-cube) - omega-net (MasPar omega-net, IBM SP-2) - hybrid (e.g., MasPar uses X-grid and `omega-net') Important issues concerning the geometry of a system are - Valence (number of connections to each processor; for example grids have fixed valence, hypercubes do not) - What is the longest path? (this gives max message latency) - Hidden and virtual geometries: should we hide the underlying geometry from the user? can we provide "virtual geometries" (e.g., use a small grid to simulate a larger one)? Classification by control stream --------------------------------- Tightly coupled systems (with either distributed or shared memory) can be further classified by their control stream as SIMD or MIMD SIMD (Single Instruction Multiple Data) Examples: MasPar, CM2, various array processors - Also called "data parallel" machines - One control stream applies to all processors - Local data can vary - Instructions can specify communication with specific neighbors, e.g., `send x to north neighbor' for a grid - Different geometries have different neighbor sets - Canonical programming construct is "plural" The following example, in MPL (MasPar Language), rotates all N rows of an N by N PE array at the same time. Each row makes N comparisons in 1 step, and so does N^2 comparisons in N steps. plural int a, b; /* declares N^2 instances of a and b */ for (i=0; i