Project 1

Project 1: A Simple Dataflow Interpreter

(c) 1996, Howard E. Motteler

Assigned Tue Sept 24; Due Tue Oct 15; 100 pts

Project Goals

The purpose of this project is to gain hands-on experience with concurrency, interprocess communication, data-flow, semaphores, and shared memory.

The Project

You are to implement an interpreter, "dfl," for a simple dataflow language. The syntax of the language is defined as follows. There are NBUFS distinct variables, named c0, c1, c2, etc. Typical statements are

                           c1 = c2 + c3
                           c0 = c1 * c4

In fact, all statements have the form <var> = <var> <op> <var>, and there are no parenthesis, making the language very easy to parse. The full set of operators is {+, -, *, =, |}, so all statements have one of the following forms

                           ci = cj + ck
                           ci = cj - ck
                           ci = cj * ck
                           ci = cj = ck
                           ci = cj | ck

where i, j, and k are numbers between 0 and NBUFS-1.

The semantics of this language are only slightly more complex than the syntax. The dfl interpreter reads a single statement from the command line, and then repeatedly executes that statement. The statements from the first example above would be invoked from the shell as

                         dfl c1 = c2 + c3 &
                         dfl c0 = c1 \* c4 &

where the "&" character puts the process in the background, and the "\" character is needed so the shell passes the "*" to the dfl program, rather than expanding it as a wildcard. (You can also enclose "*" in quotes to prevent wildcard expansion.)

Multiple invocations of the dfl interpreter communicate via named buffers--the "variables" c0, c1, etc., are actually bounded buffers of integer values. There is exactly one buffer per variable name, and buffers act as named channels over which the dfl processes communicate. The buffers are implemented with shared memory and semaphores, as discussed below. A variable reference on the left-hand side of an assignment means the indicated buffer is written to, while a variable reference on the right-hand side means the indicated buffer is read from. Thus the statement c1 = c2 + c3 is executed by repeatedly reading values from buffers c2 and c3, adding the values, and writing the resulting value to buffer c1. The process terminates when a special END value is received; more on this shortly. The "-" and "*" operators work similarly to the "+".

The "=" and "|" operators work somewhat differently. The command ci = cj = ck is a double assignment or `copy' command, and has the effect of reading a value from buffer ck and writing it to buffers ci and cj. The command ci = cj | ck is an `OR' command, for which zero is false and any non-zero value true. For full credit, your `OR' should be an "eager OR" which works as follows. Suppose dfl is invoked with the arguments

                         dfl c3 = c1 "|" c2

Data can arrive at either input c1 or c2, at any time. The ``eager OR'' should produce a value T at its output c3 as soon as possible after seeing a T at either input, for any possible sequences of values arriving at c1 and c2. For example, if the process has just started and a sequence TT arrives at input c1, the OR will produce a sequence TT at c3, without waiting for any values from c2. If the sequence TFTF then arrives at c2, the first two values are ignored, and a third T is produced; if the values FF next arrive at c1, the first F is ignored and the second matched with the 4th value from c1 (an F), and an F is produced at c3. The ``eager OR'' can only be eager in producing T values; producing an F requires checking corresponding values from both inputs.

The dfl interpreters should continue to run, performing their specified commands or waiting for input, until a special END value, defined in dftest.h, is received. (All relevant files are available as a single tar file, described below.) When an END is received as either one or both inputs of an arithmetic command, {+, -, *}, the process first sends an END to its output buffer, and then terminates. Similarly, the copy command sends an END to both its output buffers when an END is received, and then terminates. The `eager OR' should stop receiving values on a buffer when an END is received on that buffer, but it does not send an END and terminate until it has seen an END on both its inputs.

A special program dftest is provided to test your dfl program. dftest reads and prints values from buffer 0, and uses a simple command script to write sequences of values to any of the buffers. The only way your dfl interpreter should communicate is via the buffer variables, and the final version of your program shouldn't do any printing. (At least not to stdout; you can print error messages to stderr, if you want.) For this project, you don't have to worry about ill-formed or meaningless input. And as with most programming languages, not all syntactically correct statements or combinations of statements make sense or will do anything useful, and you do not have to check for such problems.

Buffers (or Channels)

You are to implement the bounded buffers used by the dfl and dftest programs. Buffers and their associated pointers should be implemented in shared memory, using semaphores for mutual exclusion. The file bufskel.c (you will want to rename this "buf.c," as discussed below) contains a partial implementation of bounded buffers, and the file dftest.h defined a structure `bufrec' used by the routines in buf.c. The partial implementation in bufskel.c gives you the read but not the corresponding write procedure, and has no mutual exclusion. You can modify the sample readbuf() procedure to work with your own write procedure, if you prefer; the only real restrictions are that you use shared memory (rather than, say, opening a unix pipe) for the buffers, that your procedures be called "readbuf()" and "writebuf()" so that dftest can be linked with them, and that your shared data structures (buffers, pointers, etc.) should be protected as critical sections.

The behavior of your dfl program when two or more processes try to read the same buffer or channel is "undefined," and we will simply assume that no well-formed set of statements will do this. It should be possible for several dfl processes to write to a single buffer, however.

Semaphores and Shared Memory

The file sem.c contains routines that are linked with your program to provide NSEMS semaphores and a block of shared memory. The signal and wait procedures defined in sem.c are called ssig() and swait(). Semaphore `names' are simply integers in the range 0 to NSEMS-1. The routine sbase() returns a pointer to the start of the shared memory block. See the file sem.c for further information.

The dftest program initializes semaphores and shared memory. Dftest sets semaphores 0 through NSEM-2 to the value 1, and writes zeros to the shared memory, so that all buffers start out initially as empty. A special semaphore `READY' is defined as semaphore number NSEMS-1. The dftest program sends NBUFS-3 signals on this READY semaphore, after all initialization is complete. Your dfl program should wait on READY after calling sinit() and before performing any operations with buffers or semaphores. You can assume that no more than NBUFS-3 instances of dfl will be started at once, so the NBUFS-3 signals will start them all.

An Example Run

Here is an example from a simple test, where the dfl processes are used to add up the number in buffers 1-3 and sending the result to buffer 0.

        dfl c0 = c3 + c4 &
        dfl c4 = c1 + c2 &

        dftest -o c1=1 c2=1 c3=1 c1=2 c2=2 c3=2 c1=end c2=end c3=end
        dftest: sending c1=1
        dftest: sending c2=1
        dftest: sending c3=1
        dftest: sending c1=2
        dftest: received 3
        dftest: sending c2=2
        dftest: sending c3=2
        dftest: sending c1=end
        dftest: received 6
        dftest: sending c2=end
        dftest: sending c3=end
        dftest: received end
        dftest: received 2 values
        dftest: clearing semaphores

In this example all of the sends occurred before any of the receives. With longer tests, it is not unusual to see sends and receives mixed together.

Test Data

Sample test scripts will be provided. Your project will be graded using scripts that are similar but may not be exactly the same as the sample test data.

Submitting Programs

You are to submit two source files: your program dfl.c, and your implementation of bounded buffers, buf.c. These programs should be submitted with the UCS submit program. For section 0101, the command to submit both your "dfl" and "buf" routines in one step is

              submit cs421_0101 proj1 dfl.c buf.c

Submit has a number of other options; see the man page for details. The submit program copies your program into a protected directory in my account, which has the same name as your user id. Do not submit from more than one account, or by email; and note that submit only works from within the "gl" domain. Do not ask for acknowledgment of your submission by email; if "submitls" (see the man page) shows your files, then I have received them.

Grading

You should read the general information on programming projects; as mentioned there, your name, SSN, and "project 1" should appear at the top of your project. Approximately 20% of your grade is for documentation, including at least a half-page description, in comments at the beginning of the program, explaining how your dfl program works. Another 20% percent is for design, and the remainder of your grade is based on how well things work. You can get partial credit even if you can't run all of the test data successfully.

Projects that do not compile get 0 points. You program should work with the routines dftest.c, dftest.h, sem.c, and the makefile as provided; when I compile and test you program for grading, I will be using the original versions. The makefile will compile your dfl program, as well as dftest and the example programs. If you really want to modify the makefile, check with me first, and submit your new makefile in addition to your other files.

You must do the project described here. Doing some other similar or dissimilar project, no matter how difficult or clever, may be worth 0 points. This is not a group project. Please do your own work, and be careful about sharing your code.

Getting Started

A tar file is available, containing all the relevant files. Rename the skeleton files dflskel.c to dfl.c, and bufskel.c to buf.c; these give you an outline to to get you started. The bufskel.c contains the read but not the write procedures, and lacks mutual exclusion; there is alas very little in dflskel.c except for include files and semaphore initialization. You should read the comments in sem.c and dftest.c. You may want to play with the signal and wait examples in sig.c and wai.c. The program show.c, which prints the values of all the semaphores, can be useful both to see how semaphores work, and as an aid in debugging your code.

Relevant Files

The following files are in the tar file mentioned above.

  dftest.c    tests a dfl program; includes documentation

  dflskel.c   a `skeletal' version of your program dfl.c
              to help get you started

  bufskel.c   a partially correct implementation of
              bounded buffer procedures for buf.c

  dftest.h    definitions for dfl.c, buf.c and dftest.c

  sem.c       semaphore package; includes documentation

  sig.c       example using ssig()
  wai.c       example using swai()
  sclear.c    release semaphores
  show.c      display semaphores and shared memory

  Makefile    compile and link everything

  reset       shell script to release semaphores and to kill
              off any lingering dfl processes

  test0-4     sample test scripts

Hints and Tips

Read this handout and the related material carefully, at least a couple of times, before starting in on the project.
It is not necessary that you master all the internal details of the provided procedures sem.c and dftest.c, but you should read the introductory documentation in both of these files.
Get your bounded buffer procedures readbuf() and writebuf() to work without mutual exclusion, first. It's a good idea to write a simple program to test your buffer procedures, before using them elsewhere.
Next, get the "+" operator to work. If you can do the simple example above successfully, even without code for mutual exclusion, you are well on your way to getting things working.
In testing by hand, don't forget to quote the "*" and "|" characters, which otherwise have a special meaning to the shell.
Because it is somewhat trickier than the other operators, a fully correct "eager or" is worth about 20% of your grade. Correct mutual exclusion is worth about 15%.

If you get error messages at the end of an otherwise successful run, e.g., something like

           <...>
           dftest: received 20 values
           dftest: clearing semaphores
           Process 5236: semop() error on swait(7)
           Process 5233: semop() error on swait(4)
           Process 5234: semop() error on swait(5)
           Process 5235: semop() error on swait(6)

it means that some of your dfl processes did not terminate after dftest received everything it was expecting. Correct termination is worth about 5% of your grade.

You should not call sclear() from within your dfl program; the dftest program calls sclear() when it has received the last value it was expecting. (It is dftest's call of sclear that kills off the dfl processes that are still waiting on semaphores, in the above example.)
Although the specifications and some of the provided code are a bit lengthy, the project itself, including documentation, need not be--dfl.c can be done very well in three or four pages pages, and buf.c in less than a page. If you are getting way beyond that, to more than six or seven pages of code, you may want to re-think your overall approach.
Don't modify the supplied header file dftest.h. If you really want to, you can use your own header file, e.g., dfl.h; if you do this, then don't forget to submit it along with your files dfl.c and buf.c.