Processes

  1. Overview
  2. Process Scheduling
  3. Process Creation
  4. Forking and Buffering
  5. The exec Function
  1. Process Termination
  2. Interprocess Communication
  3. Using popen and Pipes
  4. Win32 Process Creation

Overview

Terminology
Process States

There are several states in which a process can be. They are mutually-exclusive, so a process can only be in exactly one of these states at any given time:

State Transitions

  1. (Admitted) The process has been created and is now being put into the ready/runnable queue.
  2. (Dispatched) The OS (scheduler) has selected a process to run and is executing it on the CPU.
  3. (Timeout) The process has used up its allotted time slice and is put back into the ready queue for later execution.
  4. (Need I/O or event to happen) The process has requested I/O or has requested to wait until a future event.
  5. (I/O done or event occurred) The I/O has completed or event has occurred that the process was waiting on. The process gets put back in the ready queue.
  6. (Ending) The process has completed its task or the system has terminated the process.

Utilities like psps, pstree, top, htop (on Linux, with GUIs ksysguard, gnome-system-monitor), Task ManagerTask Manager and Process Explorer (on Windows), and Activity Monitor (on macOS) can give you lots of detailed information about all of the processes on a computer.

From the ps man page on Linux:

PROCESS STATE CODES

 Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will 
 display to describe the state of a process.
       D    Uninterruptible sleep (usually IO)
       R    Running or runnable (on run queue)
       S    Interruptible sleep (waiting for an event to complete)
       T    Stopped, either by a job control signal or because it is being traced.
       W    paging (not valid since the 2.6.xx kernel)
       X    dead (should never be seen)
       Z    Defunct ("zombie") process, terminated but not reaped by its parent.

       For BSD formats and when the stat keyword is used, additional characters may be displayed:
       <    high-priority (not nice to other users)
       N    low-priority (nice to other users)
       L    has pages locked into memory (for real-time and custom IO)
       s    is a session leader
       l    is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
       +    is in the foreground process group
Sample output running: ps aux in a virtual machine.


Process Control Block (PCB)

Each process has a block of memory (typically a C struct) that contains all of the relevant information about a process. The PCB can contain a massive amount of information. Some of the info includes:

Linux task_struct
Linux task_struct (newer) (from kernel 5.4.0 sched.h)

The PCBs are kept on a linked-list structure that represents a queue for various devices (CPU, disk, etc.)

Operating System Concepts - 8th Edition Silberschatz, Galvin, Gagne ©2009  

Process Scheduling

Types of Scheduling Context Switching

Process Creation

Process Creation The POSIX fork Function

Forking, Buffering, and Redirection

You may recall that, for efficiency reasons, data written to stdout (e.g. via printf), is buffered. This means that instead of printf writing each byte individually to the output, entire lines are written at once. This can be a significant performance improvement.

There are two major types of buffering: line buffering and full buffering. stdout is line-buffered, which means that when a newline is encountered, all of the bytes up to and including the newline are then written to stdout. The implications of this are, for example, if you were writing output using printf and your program crashed before printf encountered the newline, some bytes may never have made it to the screen.

This behavior may lead to some surprises when a parent and child process are both writing buffered output to stdout. Some examples will demonstrate.

Here is a very simple example. (fork-redirect.c)

#include <stdio.h>    /* printf       */
#include <stdlib.h>   /* exit         */
#include <unistd.h>   /* fork, getpid */
#include <sys/wait.h> /* wait         */

int main(void) 
{
  int pid; 

    /* The text will be sent to the output when the newline is encountered. */
  printf("This should only print once. (%i)\n", getpid());

  pid = fork();
  if(pid == 0)
  {
    printf("This is the child (pid: %i).\n", getpid());
    exit(123);
  }
  else if (pid > 0)
  { 
    wait(NULL);
    printf("This is the parent (pid: %i).\n", getpid());
  }
  else
    printf("Fork failed.\n");
  
  return 0;
}
This is the output (as expected):
This should only print once. (9206)
This is the child (pid: 9207).
This is the parent (pid: 9206).
Now, let's remove the newline from the first print statement and see what the output looks like. We're changing this:
  /* The text will be sent to the output when the newline is encountered. */
printf("This should only print once. (%i)\n", getpid());
to this:
  /* There is no newline, so the text won't be sent until the buffer is full. */
printf("This should only print once. (%i)", getpid());
Running the program now shows this output:
This should only print once. (9436)This is the child (pid: 9437).
This should only print once. (9436)This is the parent (pid: 9436).
The first thing to notice is that there is no newline after the first line prints. This is expected, of course, since we removed it. However, you'll notice that the first line is now printed twice. One line is printed in the parent and the other is printed in the child.

Why is that?

This is the result of line-buffered output:

  1. The parent process sends the text to printf without a newline. (The text is buffered.)
  2. The text is NOT sent to stdout because a newline hasn't been seen (and the buffer is not full).
  3. The fork call is made which duplicates the entire process.
  4. The printf call is made in the child, which includes a newline.
  5. The printf call is made in the parent, which includes a newline.
We saw this here.

The result is that the first line is seen twice. Steps 4 and 5 could be swapped if you have a situation where the parent code happened to run before the child code did.

So, how do we "fix" that?

One solution is to make sure to flush the buffer before calling fork. This ensures that the contents are sent to the output before creating the child process. Placing this code after the first printf and before the fork:

fflush(stdout);
Now, we see this as the output:
This should only print once. (9817)This is the child (pid: 9818).
This is the parent (pid: 9817).
There is still no newline after the first line (expected), but the line is only sent once to the output.

Another solution is to turn off buffering for stdout using setvbuf:

int setvbuf (FILE *stream, char *buf, int mode, size_t size);
Add this line of code before calling printf:
setvbuf(stdout, NULL, _IONBF, 0);
The third parameter is the intesting one and can be one of three: See the man page (linked above) for more details. Now, when we run the program, the line is only printed once. Keep in mind that disabling buffering may have a negative impact on the performance. However, when printing to the screen, this is usually not a problem.

The example above shows how this "problem" is the result of not printing a new line. However, there is still a problem when you do print the newline. It occurs when you redirect the output to a file.

Suppose we put the newline back into the printf statement:

  /* The text will be sent to the output when the newline is encountered. */
printf("This should only print once. (%i)\n", getpid());
We saw that this no longer caused the text to be printed twice (once in the parent and once in the child). However, if you were to run the program and redirect the output to file:
./fork-redirect > out.txt
this is what you would see in the file:
This should only print once. (18897)
This is the child (pid: 18899).
This should only print once. (18897)
This is the parent (pid: 18897).
We're back to the same problem we had without the newline. So, why is this?

It turns out that, when you redirect stdout, the OS (or shell) is setting up a pipe. (Much like the pipes you've been using at the command line). Instead of being line-buffered, the pipe is fully-buffered, which means that the output is only sent to stdout when the buffer is full. It doesn't matter if there are newlines in the text or not.

The solution is the same as before. Either you can flush the buffer before the fork, or you can set stdout back to line-buffered (or even no buffering). Either of these will do the trick:

setvbuf(stdout, NULL, _IOLBF, 0); /* line-buffered */
or
setvbuf(stdout, NULL, _IONBF, 0); /* no buffering */

The recommended approach is to just use fflush when you need to make sure the buffer is sent to the output. It's easier to use and understand (most students and beginners have never heard of setvbuf). It's also more efficient because it only affects the last printf statement, not every printf statement.

The exec Function

Process Termination

Voluntary exits: Involuntary exits: Miscellaneous:

Interprocess Communication (IPC)

We're going to look at 4 methods of interprocess communication:
  1. Shared memory
  2. Message passing
  3. POSIX pipes
  4. Sockets
1. Shared memory
Message queues


POSIX pipes


Self check: Programming Problem 3.18 from the suggested textbook.

"Design a program using ordinary pipes in which one process sends a string message to a second process, and the second process reverses the case of each character in the message and sends it back to the first process. For example, if the first process sends the message Hi There, the second process will return hI tHERE. This will require using two pipes, one for sending the original message from the first to the second process, and the other for sending the modified message from the second back to the first process."

#include <stdio.h>    /* printf, fgets            */
#include <stdlib.h>   /* exit                     */
#include <string.h>   /* strlen                   */
#include <ctype.h>    /* isalpha, toupper         */
#include <unistd.h>   /* pipe, read, write, close */
#include <sys/wait.h> /* wait                     */

void revcase(char *buffer)
{
  int i;
  int len = strlen(buffer);
  for (i = 0; i < len; i++)
  {
    if (isupper(buffer[i]))
      buffer[i] = tolower(buffer[i]);
    else if (islower(buffer[i]))
      buffer[i] = toupper(buffer[i]);
  }
}

int main(void) 
{
  int pid;

  /* setup stuff */
  
  pid = fork();
  
  if (pid == 0) /* child */
  {
  
    /* DO STUFF */  
  
    exit(0); 
  }
  else /* parent */
  {
    /* DO STUFF */
      
    wait(NULL);  
  }
  
  return 0;
}

Using popen for pipes

If you just need to setup a pipe between two processes, there is a function called popen which makes things easier. It basically performs the fork, exec, and pipe stuff for you. Since many programs just need this kind of behavior, it can be a real convenience.

This sample code simply prints out the strings: one two three four five six seven to the screen (stdout): (popen0.c)

#include <stdio.h> /* printf */

int main(void)
{
  int i;
  char *array[] = {"one", "two", "three", "four", "five", "six", "seven"};
  int size = sizeof(array) / sizeof(*array);

    /* Print to stdout */
  for(i = 0; i < size; i++) 
    printf("%s\n", array[i]);

  return 0;
}
Output:
one
two
three
four
five
six
seven
If we wanted the output sorted, we would pipe the output to the standard sort program that is available on all POSIX systems using the pipe symbol:
./popen0 | sort
Output:
five
four
one
seven
six
three
two
But, suppose we wanted to sort the data within our program and not require the user to have to do the piping on the command line? That's where the popen function comes in handy.

This example shows how you can use the sort program to sort your data from within your program. So, instead of using printf to print to stdout, we're using fprintf to print to the sort process! (popen1.c)

/* compile with -D_BSD_SOURCE if using -ansi */

#include <stdio.h> /* fprintf, popen, pclose, perror */

int main(void)
{
  int i;
  FILE *pfp;
  char *array[] = {"one", "two", "three", "four", "five", "six", "seven"};
  int size = sizeof(array) / sizeof(*array);

    /* Create write-only pipe (i.e. open program for writing) */
  pfp = popen("sort", "w");
  if (!pfp)
  {
    perror("popen");
    return 1;
  }

    /* Print to pipe (write to sort process) */
  for(i = 0; i < size; i++) 
    fprintf(pfp, "%s\n", array[i]);

    /* Close the pipe */
  pclose(pfp);

  return 0;
}
Output:

five
four
one
seven
six
three
two
  • Diagram from The Linux Programming Interface book:

  • This example shows how you can setup a pipe within your C program just as if you were using the command line: (popen2.c) This is equivalent to the command line:    ls /usr/bin | sort -r

    /* compile with -D_BSD_SOURCE if using -ansi */
    
    #include <stdio.h> /* popen, perror, fprintf, pclose, fgets */
    
    #define BUFSIZE 100
    
    int main(void)
    {
      FILE *inpipe, *outpipe;
      char buffer[BUFSIZE];
    
        /* read pipe from ls (i.e. open ls program for reading) */
      inpipe = popen("ls /usr/bin", "r");
      if (!inpipe)
      {
        perror("popen read:");
        return 1;
      }
    
        /* write pipe to sort (i.e. open sort program for writing) */
      outpipe = popen("sort -r", "w");
      if (!outpipe)
      {
        perror("popen write:");
        return 2;
      }
    
        /* read from ls and write to sort (reversed) */
        /* it's this: ls /usr/bin | sort -r          */
      while(fgets(buffer, BUFSIZE, inpipe))
        fprintf(outpipe, "%s", buffer);
    
        /* clean up */
      pclose(inpipe);
      pclose(outpipe);
    
      return 0;
    }
    
    Partial output:
    zxpdf
    zsoelim
    zsh
    zrun
    zlib-flate
    zjsdecode
    zipsplit
    zipnote
    zipinfo
    zipgrep
    zipdetails
    zipcloak
    zip
    zim
    .
    .
    .
    aainfo
    aaflip
    aafire
    a5toa4
    a5booklet
    a2ping
    a2p
    7zr
    7za
    7z
    2to3-3.4
    2to3-2.7
    2to3
    [
    
    On my system there are over 4,000 lines!

    Capturing output from the compiler: (popen3.c)

    /* compile with -D_BSD_SOURCE if using -ansi */
    
    #include <stdio.h> /* popen, perror, printf, pclose, fgets */
    
    #define BUFSIZE 100
    
    int main(void)
    {
      FILE *inpipe;
      char buffer[BUFSIZE];
    
        /* read pipe from gcc */
      inpipe = popen("gcc foo.c", "r");
      if (!inpipe)
      {
        perror("popen read:");
        return 1;
      }
    
        /* Read from compiler and output to screen */
      while(fgets(buffer, BUFSIZE, inpipe))
        printf("%s", buffer);
    
        /* clean up */
      pclose(inpipe);
    
      return 0;
    }
    
    This is foo.c:
    int main(void)
    {
      return; /* Missing return value */
    }
    
    Output:
    foo.c: In function 'main':
    foo.c:3:3: warning: 'return' with no value, in function returning non-void
       return;
       ^
    
    With this knowledge, you can now you can start writing your own IDE (e.g. Visual Studio)!

    Create your own IDE on Windows with child processes and pipes.

    Win32 Process Creation

    CreateProcess Example (CreateProcess.cpp)
    #include <iostream>
    #include <windows.h>
    
    int main(void) 
    {
      STARTUPINFO start_info;
      PROCESS _INFORMATION proc_info;
      
      DWORD pid = GetCurrentProcessId();
      std::cout << "parent pid = " << pid << std::endl;
    
        // allocate memory and set to 0
      ZeroMemory(&start_info, sizeof(STARTUPINFO));
      ZeroMemory(&proc_info, sizeof(PROCESS_INFORMATION));
      
      std::cout << "creating child process" << std::endl;
      const char *program = "c:\\windows\\system32\\notepad.exe";
      BOOL err = CreateProcess(program,     // program to run
                               0,           // command line
                               0,           // security attributes
                               0,           // thread attributes
                               FALSE,       // don't inherit handles
                               0,           // creation flags (none)
                               0,           // use parent's environment
                               0,           // use parent's directory
                               &start_info, // start up info
                               &proc_info   // process info
                              );
      
      if (!err)
      {
        std::cout << "Error creating process" << std::endl;
        return -1;
      }
    
      std::cout << "waiting for child to terminate" << std::endl;
      WaitForSingleObject(proc_info.hProcess, INFINITE);
      std::cout << "parent terminating" << std::endl;
    
      CloseHandle(proc_info.hProcess);
      CloseHandle(proc_info.hThread);
    
      return 0;
    }
    
    Creating Multiple Processes Example
    #include <stdio.h>
    #include <windows.h>
    
    int main(void) 
    {
      const int COUNT = 2;
    
      HANDLE proc[COUNT], thread[COUNT];
      const char *programs[] = {"c:\\windows\\system32\\notepad.exe",
                                 "c:\\windows\\system32\\mspaint.exe",
                                };
    
      for (int i = 0; i < COUNT; ++i) 
      {
        STARTUPINFO si;
        PROCESS_INFORMATION pi;
    
        ZeroMemory(&si, sizeof(si));
        ZeroMemory(&pi, sizeof(pi));
    
        CreateProcess(programs[i], 0, 0, 0, FALSE, 0, 0, 0, &si, &pi);
    
        proc[i] = pi.hProcess;
        thread[i] = pi.hThread;
      }
    
      WaitForMultipleObjects(COUNT, proc, TRUE, INFINITE);
    
      for (int i = 0; i < COUNT; ++i) 
      {
        printf("Process: %i, Thread: %i ended.\n", proc[i], thread[i]);
        CloseHandle(proc[i]);
        CloseHandle(thread[i]);
      }
      return 0;
    }