Building Coroutines

Suppose we have three functions A(), B() and C(), where A() calls B() to do something and then calls C() to do some other thing. This calling activity is shown in the following figure. More precisely, when A() calls B(), A() waits until B() returns. When B() is called, it always starts its execution with its first statement. The similar holds true for the call to C().

However, this type of call/return is not always the best way of using routines. Suppose a system runs three threads A, B and C with the round-robin scheduling policy. In other words, the threads run in the order of A, B, C, A, B, C, A, B, C and so on. The following figure shows the scheduling of these threads. The red arrows are thread executions, doted arrows indicate context switching, and different rounds use different colors.

Thus, A runs for a while, and the CPU is switched to B. Then, B runs for a while, and the CPU switched to C. When C finishes its time quantum, the control switches back to A; however, the restarting point is where A was suspended previously. When B and C restart, the restarting points are where B and C were suspended. If we consider A, B and C as three functions calling each other (simulated by switching CPU), the call brings the control to the statement where the caller was suspended. Routines organized in this way are called coroutines. Coroutines have been used to construct finite automata, parsers and, of course, multithreaded systems!

To show you a simple example, let us only consider two functions working as two coroutines. Click here to download a copy of this program.

#include  
#include  

int       max_iteration, iter;

jmp_buf   Main, PointA, PointB;

void      Ping(void);
void      Pong(void);

void  main(int  argc, char* argv[])
{
     max_iteration = abs(atoi(argv[1]));
     iter = 1;
     if (setjmp(Main) == 0)
          Ping();
     if (setjmp(Main) == 0)
          Pong();
     longjmp(PointA, 1);      
}

void  Ping(void)
{
     if (setjmp(PointA) == 0)
          longjmp(Main, 1);
     while (1) {
          printf("%3d : Ping-", iter);
          if (setjmp(PointA) == 0)
               longjmp(PointB, 1);
     }
}

void  Pong(void)
{
     if (setjmp(PointB) == 0)
          longjmp(Main, 1);
     while (1) {
          printf("Pong\n");
          iter++;
          if (iter > max_iteration)
               exit(0);
          if (setjmp(PointB) == 0)
               longjmp(PointA, 1);
     }
}

The logic of this program requires some explanation.

In a multithreaded system, we may have many functions running as threads, each of which has an environment in the runtime stack. The thread scheduler just uses a technique similar to the above program to schedule those threads that are in the ready queue.

But, ....

It seems that we have a working multithreaded system. Really? No, things are not that simple. If you take a look at functions Ping() and Pong(), you will see that both only use global variables. If the program is changed to use local variables, it is likely that you will obtain incorrect result. The following is a modified version. Click here to download a copy.
#include  
#include  

int       max_iteration;      

jmp_buf   Main;        
jmp_buf   PointPing;
jmp_buf   PointPong;    

void      Ping(void);
void      Pong(void);

void  main(int  argc, char* argv[])
{
     if (argc != 2) {             
          printf("Use %s max-#-of-lines\n", argv[0]);
          exit(1);
     }
     max_iteration = abs(atoi(argv[1]));
     if (setjmp(Main) == 0)
          Ping();
     if (setjmp(Main) == 0)
          Pong();
     longjmp(PointPing, 1);
}

void  Ping(void)
{
     int  i;

     if (setjmp(PointPing) == 0) {
          i = 1;
          longjmp(Main, 1);
     }
     while (1) {
          printf("Ping (%2d) - ", i);
          i++;
          if (setjmp(PointPing) == 0)
               longjmp(PointPong, 1);
     }
}

void  Pong(void)
{
     int  j;

     if (setjmp(PointPong) == 0) {
          j = 1;
          longjmp(Main, 1);
     }
     while (1) {
          printf("Pong (%2d)\n", j);
          j++;
          if (j > max_iteration)
               exit(0);
          if (setjmp(PointPong) == 0)
               longjmp(PointPing, 1);
     }
}

Local variables i and j are counters and are initialized when Ping() and Pong() are called. The output of this program with max_iteration set to 4 may look like the following:

Ping ( 1) - Pong ( 2)
Ping ( 3) - Pong ( 4)

This output is definitively incorrect, because the while loop in function Pong() should iterate four times, printing out Pong ( 1), Pong ( 2), Pong ( 3), and Pong ( 4). But, we have have Pong ( 2) and Pong ( 4) in the above output. (WARNING: your system may generate a different output.) Moreover, the output from Ping() is also wrong. So, what is the problem?

The problem is in stack memory allocation as we have discussed a couple of times. When the main starts execution and before reaches the first setjmp(), the stack has only one stack frame as shown in Figure (b) below. Then, Ping() is called, and the stack has two frames as shown in Figure (a). Function Ping() uses a longjmp() to bring the control back to the main program, effectively restoring the stack back to the point of the first setjmp() being called (Figure (a)). When the main program calls setjmp() to mark a second return point and then calls Pong(), the stack frame for function Pong() will likely occupy the same location which was allocated for Ping()! Because both functions Ping() and Pong() have only one local variable, they will likely share the same location on the stack. More precisely, it is highly likely that i of Ping() and j of Pong() share the same address. In fact, you can print their addresses to verify this. Therefore, when Ping() prints and increases the value of i, Pong() loses its original value of j and prints the next one!

(a) (b) (c)

To overcome this problem, we need to allocate separate stack space for each function that is running as a thread. Please refer to our mini-project to learn how this can be done easily. Unfortunately, this is usually system-dependent. Check our mini-project for a simple implementation of multiple stacks, on for each thread, on Sun Solaris.