CMSC 421 Operating Systems Lecture Notes
(c) 1993, 1996 Howard E. Motteler


Distributed Systems (Chapter 9)
================================

Hardware (9.2)
---------------

There are a number of ways of classifying systems with more than
one processor; for example

  - shared vs distributed memory
  - tightly vs loosely coupled systems (for distributed memory)
  - SIMD vs MIMD (for "parallel processors")


Shared Memory 
==============

Examples: multi-processor Cray & SGI machines

Features of a shared memory system:

   - single common address space
     (all processors share the same memory)

   - usually bus based and synchronous (has a global clock)

   - all inter-process communication is through memory

   - semaphores or some similar mechanism is needed for
     synchronization

Shared memory advantages:

   - very cost-effective for moderate numbers of processors
     (up to 10 or 20) as it's easy to place several processors 
     on a common bus

   - familiar programming paradigm--threads.  (Project 1 used 
     threads, which in general are just a group of processes 
     working together and communicating through shared memory)

Shared memory disadvantages:

   - does not scale well, as the bus bandwidth becomes a bottleneck
     for large numbers (more than 20 or 30) of processors

   - shared data can be a bottleneck, as a single memory
     address can be accessed by only one processor at a time
     (note that critical sections are always a bottleneck)

Shared memory systems normally use a memory cache to improve
performance 

This speeds access time and allows a single address to be read
by several processors at the same time	

Problem: cache coherence

   - suppose processors P1 and P2 both have variable V in their
     cache

   - P1 modifies V

   - P2 won't see the updated value until P1's cache is
     written back to main memory

           P1              P2
           |               | 
      P1   == V'      P2   == V
     cache ==        cache ==
           |               |          Main
           -------------------------- Memory

One solution:

   - use write-through cache, and
     (writes are copied back to main memory)

   - snoop: when a cache sees a write to an address it has, 
     it either updates the value or drops it from the cache


Distributed Memory 
===================

In a distributed memory system

   - each processor has its own memory

   - communication can either be bus-based or
     through some kind of message-passing network  
     (your text calls this a `switched' system)

Examples: MasPar, Connection Machines, Intel Paragon;
potentially, any network of workstations

Some distributed memory machines (e.g., the KSR-1 and Cray T3D)
*simulate* shared memory with distributed memory and message
passing


Loose vs tight coupling
------------------------

Distributed memory systems can be classified by whether they are
"loosely" or "tightly coupled"

(Shared-memory systems are always tightly coupled)

Very generally, 

   - tightly coupled = high bandwidth between processors
   - loosely coupled  = low bandwidth between processors

(bandwidth = bytes/second transfer rate)

In practice, latency--the time it takes the first part of a
message to arrive--is just as important as bandwidth

Tightly coupled systems can be used for a broad range of
single applications that need to run fast, such as

  - graphics
  - linear algebra
  - neural networks
  - finite element (and finite grid) models

Loosely coupled systems have more limited performance but are
fine for many applications, such as

  - Network File System (NFS)
  - the Internet
  - "embarassingly parallel" applications (application that
    don't do much communication), such as computing images of 
    the Mandelbrot set


Network Geometry
-----------------

Message passing distributed memory systems can also be 
classified by their connection geometry:

   - bus-based   (workstations on ethernet)
   - grids       (MasPar X-grid)
   - hypercubes  (CM-2, N-cube)
   - omega-net   (MasPar omega-net, IBM SP-2)
   - hybrid      (e.g., MasPar uses X-grid and `omega-net')

Important issues concerning the geometry of a system are

   - Valence (number of connections to each processor;
     for example grids have fixed valence, hypercubes do not)

   - What is the longest path? (this gives max message latency)

   - Hidden and virtual geometries:

        should we hide the underlying geometry from the user?
     
        can we provide "virtual geometries" (e.g., use a small 
        grid to simulate a larger one)?


Classification by control stream
---------------------------------

Tightly coupled systems (with either distributed or shared
memory) can be further classified by their control stream as
SIMD or MIMD

SIMD  (Single Instruction Multiple Data)

Examples: MasPar, CM2, various array processors

   - Also called "data parallel" machines

   - One control stream applies to all processors

   - Local data can vary

   - Instructions can specify communication with specific 
     neighbors, e.g., `send x to north neighbor' for a grid

   - Different geometries have different neighbor sets

   - Canonical programming construct is "plural" 


The following example, in MPL (MasPar Language), rotates all N
rows of an N by N PE array at the same time.  Each row makes N
comparisons in 1 step, and so does N^2 comparisons in N steps.

plural int a, b;    /* declares N^2 instances of a and b */

for (i=0; i<N; i++) {

  b = xnetE(1).b;   /* replace b with the value 1 step east  */
  compare(a,b);     /* a is the "local" (non-rotating) value */
  }


MIMD  (Multiple Instruction Multiple Data)

Examples: Intel Paragon, Cray T3D, SGI Challenge

   - Each processor has its own control stream

   - More general, possibly harder to program than SIMD

   - Canonical programming construct: PAR, or fork, or some 
     way to start a set of communicating processes or threads

   - CSP and Dataflow are MIMD parallel programming paradigms

   - Dataflow machines are a specific subclass of MIMD machines


Your dfl project implemented a simple MIMD dataflow language


Software (9.3)
===============

Operating Systems
------------------

Distributed operating systems can be roughly classified by their
intended application (networking vs. parallel processing) as

   - loosely coupled 
     (e.g., Unix with NFS)

   - tightly coupled
     (e.g., SGI Challenge, MasPar, Cray T3D)

Tightly coupled shared memory multi-processor such as the SGI
Challenge are sometimes called "symmetric multiprocessors"

The OS for such systems can be relatively conventional:

 - System data structures are kept in shared memory

 - The ready list feeds several CPUs, rather than just one

 - Cache coherence is mainly a problem for system variables;
   individual process caches are flushed on context switch

Issues for such systems

 - A uniform "threads" paradigm simplifies design

 - Critical sections become bottlenecks

 - We can avoid critical sections with message passing

 - We can resolve the issue of which processor responds to 
   an interrupt by treating interrupts as messages that unblock 
   a waiting driver, 

 - The driver thread can then run on the next free processor


[ NFS (Network File System) is an example of a loosely coupled
  OS, and is discussed in the next note set. ]