Air pipes

Linux Fu: simple tips | Hackaday

It used to be that you had a computer and it did one thing at a time. Literally. You would load your cards or punch tape or whatever and press a button. The computer would read your program, run it, and spit out the results. Then he would go back to sleep until you gave him more information.

The problem is that computers – especially back then – were expensive. And for a typical program, the computer spends a lot of time waiting for the next punch card to appear or for the magnetic stripe to arrive at the correct position. In these cases, the computer was figuratively tapping its foot while waiting for the next event.

Someone clever figured out that the computer might be working on something else while it waits, so you should power more than one program at a time. While program A is waiting for an I/O operation, program B can progress. Of course, if program A didn’t do any I/O, program B would starve, so we invented preemptive multitasking. In this scheme, program A runs until it can no longer run or until a predefined time limit occurs, whichever comes first. If the time expires, the program is forced to sleep a little so that program B (and other programs) have their turn. This is how virtually all modern computers work outside of tiny embedded systems.

But there is a difference. Most computers now have multiple processors and special means to quickly switch tasks. The desktop I am writing this on has 12 processors and each can act as two processors. So the computer can run up to 12 programs at a time and have 12 more that can very quickly replace any of the 12 active ones. Of course, the operating system can also activate and deactivate programs in this stack of 24, so you can run a lot more than that, but switching between the primary 12 and the standby 12 is extremely fast.

So the case is stronger than ever for writing your solution using more than one program. There are many advantages. For example, I took on a program that did a lot of calculations, then spent hours printing out the results. I separated printing to separate tasks on different printers and reduced about 80% of turnaround time, which was almost a day when I started. But even outside of performance, process isolation is like the ultimate encapsulation. The things you do in program A shouldn’t be able to affect program B. Just as we isolate code in modules and objects, we can go further and isolate them in processes.

Double edged sword

But that’s also a problem. Presumably, if you want two programs to cooperate, they have to influence each other in some way. You could just use a file to talk to each other, but that’s notoriously inefficient. Thus, operating systems like Linux provide IPC interprocess communications. Just as you make certain parts of an object public, you can expose certain things in your program to other programs.

The most basic way to do this is with the fork call. When you fork a new process, the new process is a full copy of its parent. You don’t always realize this because the next thing you often do is call something like exec to load a new program or you are using a wrapper-like system that calls fork and exec for you. But every time you run, say, ls at the command prompt you run ls program begins its life as a complete copy of the shell. This copy then loads the ls executable and runs it.

What if it wasn’t? This is how my report writer worked. Large calculations, which took hours on a Sequent computer with lots of processors, happened in a single process. At print time I created a bunch of sub-processes. Each had a full copy of the data which I then made read-only and started printing. It is a way of communicating between processes.

Another way is pipes. Imagine a command line like:

cat data.txt | sort | more

Here you create three processes. One dumps data from a text file. It sends this data to a pipe which is connected to the sort program. It exits to another pipe which is connected to the more program.

One way

Pipes like this are one-way, but you can create named pipes and talk about it both ways. You can do both in the shell — mknod creates a named pipe – but you can also do both in one program. (popen is very easy to use for regular pipes and there is a mknod API call as well.)

There are several other methods you can use to talk through the processes:

  • Message queues – A way to send messages to another process asynchronously
  • Semaphores – A way to share a counter with another program
  • Shared Memory – Share a block of memory
  • Signal – You can send signals to other processes which can be used as a form of communication

You might wonder why you need anything beyond shared memory. Honestly, you don’t, but it’s easier in many cases to use a different method. The problem is that you need a way to have an atomic operation and things like semaphores handle that for you. Imagine if we had a variable in shared memory called busy. Yes busy is 1, then we know that we shouldn’t modify the data in our shared memory because someone is using it.

We could write:

while (busy) ; // wait for busy==0
busy=1;
do_stuff();
busy=0;

Sounds awesome, right? No. Somewhere in the CPU, this while loop looks like this:

while_loop2384: TST busy ; set flags on busy
                JNZ while_loop2384 ; if no zero flag, jump
                MOV busy,#1 ; move 1 to busy

Most of the time this will work fine. Most of the time. But what if I do TST instruction, then I’m put to sleep so another program can run the same code? Or another processor is executing the exact same code at the same time? It can happen. So both programs will now see that busy is zero. Then they will both fix busy to 1 and continue. It’s a fail.

Semaphores handle this through an atomic access mechanism that allows the program to test and define the operation in one place. There’s more to worry about, like what happens if I wait for process B to release a semaphore and process B waits for me to release another. But this situation – the stalemate – is a subject for the future, along with other pitfalls such as priority inversion.

In progress

I have a little invented problem. On Linux, if I type df I can discover all the mounted things and their characteristics. But this list includes things like the root directory and the swap file. What if you just wanted to read loop devices and show the same output format? There are many ways to do this, of course. You can read loop files in /etc/mtab and then read other data from /sys or wherever it resides. Seems like a lot of work to me.

Of course, run df almost gets us there. In fact, I could just run a pipeline in the shell to get what I want, sort of:

df | grep '^/dev/loop'

It works but the output is garbled. On my system /dev/loop3 is first and /dev/loop0 is last and there is no clear reason why number 4 is between 8 and 14. So I want to sort it. Going through sorting doesn’t help much because it will sort alphabetically. You might think of the -n flag for sort, but that won’t work because the number is at the end of the string. Of course, I could use a weird combination of cut or sed to maybe sort it all out, but it’s getting too complicated. Let’s just write some C code.

The first step is to get df to print everything and capture the output. Since we want to process the output, we need to read a pipe, and popen() is an easy way to configure it:

#include 
int main(int argc, char * argv[]) {
// This part reads the output of DF into the lines array (with some modifications)
   FILE * result = popen("df", "r"), * sort;
   int i;
   if (!result) {
      perror("Can't open df");
      return 1;
   }
   while (!feof(result)) {
     int c = getc(result);
     if (c != EOF) putchar(c);
   }
 pclose(result);
 return 0;
}

half solved

That’s half the problem solved. If you have the characters, you can do all the sorting and filtering you want, but… wait a minute! I’m always lazy. So ask the shell to help us. Here is my plan. I know I only want lines that start with /dev/loop so let’s do this:

  • Read an entire line at a time
  • If it’s not a /dev/loop line, discard it
  • If it is a /dev/loop line, save it in an array but cut the /dev/loop part
  • Once we have all the lines, tell the shell to sort, then add the /dev/loop after the sort

Easy enough:

#include 
#include 

char buffer[4097];
char * lines[512];
unsigned int maxline = 0;

int main(int argc, char * argv[]) {
// This part reads the output of DF into the lines array (with some modifications)
   FILE * result = popen("df", "r"), * sort;
   int i;
   if (!result) {
      perror("Can't open df");
   return 1;
   }
   while (!feof(result)) {
 // get a line from df
     char * rc = fgets(buffer, sizeof(buffer), result);
// only save lines that start with /dev/loop
    if (rc && !strncmp(buffer, "/dev/loop", 9)) {
// out of space
       if (maxline >= sizeof(lines) / sizeof(char * )) {
       fprintf(stderr, "Too many loopsn");
       return 2;
     }
   lines[maxline++] = strdup(buffer + 9); // copy just the number part and the rest of the line
   // should check lines[maxline[1] for null here
   }
 }
 pclose(result);

// Now we are going to print through sort
// The sed replaces the /dev/loop at the front of the line after sorting
 sort = popen("sort -n | sed 's/^//dev/loop/'", "w");
 if (!sort) {
   perror("Can't open sort");
   return 3;
  }
// for each line, send to pipe (note order didn't really matter here ;-)
 for (i = 0; i < maxline; i++)
   fputs(lines[i], sort);
 pclose(sort);
 return 0;
}

And There you go. Yes, you can do it with the shell alone, but it would be a lot harder unless you’re using another programming language like awk and not just using a shell. Also, it makes a great example and there are a lot of things you could do like this that would otherwise be very difficult to do.

You might be wondering if you could create something like sort and both feed it input and read its output. The answer is yes, but not with popen(). the popen() the call is just a convenient wrapper pipe and fork. If you wanted to do both ends, you would have to use the pipe() call directly (twice) then run sort or whatever. But that is a subject for the future.

There are also many other future topics for interprocess communications. But for now try the pipes using popen. Critical sections also appear in shell scripts. If you prefer to write your scripts in C, that is also possible.