UP | HOME
CEG 7370

Programs and Processes

The word "program" is often used loosely, and interchangeably with the word "process". But we should be very careful in the use of this word. It is silly to say that "a program is running." It should be "a process is running." This article and the associated lab experiment serve as a technical introduction at the freshman level to programs and processes. It introduces the control of processes as in stopping, resuming, changing priorities, and explains the resources (such as CPU time, etc.) consumed.

Programs

A program is a static (i.e., unchanging/passive) entity. It is a file whose content is rigidly formulated as needed by the operating system. For each OS, there are several such rigid formats. In Linux, ELF is the most common, and there are other formats. In Windows, the EXE format is the most common, and the obsolete COM format is till in use.

Programs are divided into two classes for the purposes of this course: Applications and Systems Programs. Programs such as word processors, email clients, web browsers are applications. Programs such as init (that controls the sessions of an OS), the loader that load a program into memory as a required prelude to making the program into a process, ifconfig/ ipconfig (that set the parameters of network cards) are systems programs. Programs whose absence would make an OS incomplete/ crippled are called systems programs. Programs that make a computer system useful in a particular way are applications. This definition has been evolving over the decades. E.g., compilers, linkers, and shells used be considered systems programs.

Creation of Programs

We write the source code of programs. A program may also have help files, documentation, and other such files. These are not essential in that their absence will not disable the launching of a program. When these files are asked for, you will only get a "missing file" error.

Compiling

The source code is a file of text that must abide by the syntax and semantics of some programming language. Some well known programming languages are C++, Java, Perl, Python, and Assembly. For reasons of modularity and manageability, the source code is often split into multiple files.

Source code files are processed by programs called compilers, interpreters, and assemblers. After compilation of the source code, object code files are produced. The content of the object files is rigidly controlled. It is often the case that source code files written in different programming langauges are compiled into object code files that can be linked together.

In Linux, object code files have .o extension; in Windows, the extension for object code files is .obj.

Java files typically get compiled into the byte code of JVM, which is platform (i.e., CPU and OS) independent; the extension for these byte code files is .class. There are regular compilers also that compile Java straight into the machine code of a specific CPU.

Integrated development environments (IDE) are the primary tools for developing programs. Behind the scences, they compile, link, and manage the entire development activity. In this course, we are trying to understand these activities. In Linux, the command line tools with the names gcc or g++ are driver programs that examine the arguments in a sophisticated way given and invoke appropriate tools (such as compilers, assemblers, and linkers) based on the arguments.

Linking

The object code files and methods/ procedures/ functions from pre-existing library files are linked into an executable file that is then qualified to be called a program. In Linux, programs (traditionally) do not have any extension. In Windows, program files have .exe extension. Files with .com extension are old format program files dating from MS DOS.

The structure and content of an object code file obeys rigid rules. Conceptually, we can think of each file beginning with a TOC (table of contents like in a book), followed by the executable machine code of the various methods. The TOC describes among aother things imported and exported symbols (i.e., names of variables, methods, etc). A given object file may use names that are defined elsewhere; these are imported sysmbols. A given object code file may define some symbols that may or may not be used within that file, but are intended to be of use elsewhere. These are exported symbols.

A linker (also called linkage editor) essentially "stitches" the object code files together replacing all references of imported symbols with their addresses defined in the exports list. This stitching succeeds only when all the imported symbols across all object code files, that make up one program, are found among the exported symbols (including those exported by various libraries).

In Linux, the linker is actually named ld for historical reasons. It has nothing to do with the loading activity described below.

Libraries

Certain methods are so common and so useful that over the decades the code for these has been developed carefully and optimized into collections known as libraries. A library can be viewed as a catenation of object code files with a TOC up front.

In Linux, library files have names ending with the extension .so and in Windows .dll. These are essential in that the absence of any such file will cause the launch of a program to fail.

Programs are typically dynamically linked with the many widely-known libaries. The command ldd displays the list of such libraries.

% ldd /bin/ls
 linux-vdso.so.1 =>  (0x00007fffcdfff000)
 libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007fdcbca3e000)
 librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fdcbc836000)
 libacl.so.1 => /lib/x86_64-linux-gnu/libacl.so.1 (0x00007fdcbc62d000)
 libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdcbc26e000)
 libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdcbc06a000)
 /lib64/ld-linux-x86-64.so.2 (0x00007fdcbcc93000)
 libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fdcbbe4c000)
 libattr.so.1 => /lib/x86_64-linux-gnu/libattr.so.1 (0x00007fdcbbc47000)

The program file /bin/ls does not duplicate the code of the methods it uses from these libaries. These are linked at run-time as needed in the process image of the program. The content of /bin/ls remains as-was. It is possible to statically link every thing by changing relevant settings in the IDE or linker. Obviously, a statically linked program file will be much larger.

System Calls

Every OS includes methods that are intended to be called by processes. These methods cause a change of mode: the process may be in the so-called "user" (or unprivileged) mode, and this call of an OS internal method causes the process to enter into a "kernel" (i.e. privileged) mode for the duration of this method. Such calls are known as system calls. To support this, all modern CPUs have special instructions variously called INTerrupt, trap, svc (supervisor call), distinct from an instruction, usually labeled CALL, that calls another method.

A typical example of system calls are following four on files: open a file, read the next several bytes of a file, write the next several bytes of a file, close the file.

Both OS have some 300+ system calls. These can be invoked directly, but more commonly the system calls are wrapped inside more convenient library methods.

"Interoperability" of Programs

Consider the file /bin/ls. This is a program in the Linux OS. Suppose we copy this file over to Windows -- no changes, byte-for-byte an exact copy. Windows refuses to recognize this as a program. Obviously a program contains the machine code specific to a CPU. So if we move a program from an Intel CPU machine to an ARM CPU machine, it cannot be invoked.

A program is a file with a rigorous structure at the level of byte sequences. Standards such as ELF and COFF describe this structure. Humans do not construct programs "by hand" any more -- Linkers do. class of programs

In recent years, virtual machines, and emulators of immense capabilities have appeared. E.g., It is possible to a run a Linux program, as-is, on Windows and vice-versa. Examples of such software: virtualbox, wine, qemu, crossover.

Directories of Programs

What does installing a program do? install, *.deb, *.rpm. *.msi, installable *.exe files.

Programs are installed into specific directories. An installation package is a "bundle" of files. In Linux, the "tar ball" is used. In Windows, a zip, an .msi, or even a specially constructed .exe is used. Installation is performed by invoking a special program (for now, let us call it the installer). This involves unbundling the files, and moving the files into the destination directories. Before such an installation, the installer checks the veracity of the package. The details of installer will be discussed further under the heading of Sys Admin.

The standard directories are shown below. The rows are roughly aligned based on their functionality. The /usr/bin and C:\Program Files directories contain applications. The /sbin, /usr/sbin, C:\Windows\system32 contain system programs.

On Linux
  1. /bin
  2. /usr/bin
  3. /sbin
  4. /usr/sbin
.

On Windows

  1. C:\Windows
  2. C:\Program Files
  3. C:\Windows
  4. C:\Windows\system32

On Windows, C: is used as an example only; do echo %SystemDrive% in cmd or powershell to see the actual drive name on the PC your are working on.

Utilities on Programs

The following are standard programs that you are expected to learn as part of this course in the context of programs. For further details on the commands, look it up in the text book, man/help pages, and the web.

Linux Windows Brief description
file Heuristically determine what kind of a file the given one is
size Display sizes of code, bss, and data of a program
ldd tasklist /m Display the libraries needed to invoke a program
env set Display the Environment of the invoking shell
install Install a program
nm Display the names of variables, methods, etc defined in a program or object code file
strip Strip the above names etc.

In tables such as the above, some entries are/will-be blank. This does not mean that the OS cannot do the equivalent. It simply means that the standard installation did/does not come with an equivalent program. There is a well-known subsystem http://www.cygwin.com/ that runs under Windows providing most of Linux functionality, using the program names of Linux. On our lab machines, cygwin is installed, but these are clearly not Winodws-native.

Linux includes many utilities for extracting information about the contents of files. Two of the most important are file and size.

file FILENAME... will output the type of the given files, such as "ASCII text" or "MP3 file with ID3 version 2.3.0 tag". It does this by examining certain distinctive patterns of bytes within a file (called the type's magic number), and can often get quite detailed information.

size outputs information about object files or compiled executables, such as those produced by GCC. Specifically, it lists the sizes of the various sections of the object file. The "text" section is the code, "data" contains data which is initialized, and "bss" is the uninitialized part of the data segment. (Recall the difference between initializing a variable and merely declaring it from CS240.)

System Programs v. Applications

System Programs are programs that are essential to the OS. Their absence will cause significant loss of OS functionality. E.g., the "login" program/process takes the userid and password of a user and instantiates a working session for the user. The "loader" program/process loads the program file into memory in preparation to giving birth to the corresponding process. The "ifconfig" assigns an IP address to a/the network card. The "mount" makes file volumes accessible. In Linux, system programs are located in /sbin, and /usr/sbin. In Windows, system programs are located in C:\Windows, and C:\Windows\system32.

Unfortunately some non-systems programs slip into these directories. Whereas C:\Windows\regedit.exe is a system program, e.g., C:\Windows\write.exe and C:\Windows\System32\notepad.exe are rather trivial applications.

Over this decade, some programs that used to be considered system programs are now viewed as applications. Shells (graphical or not, such as bash, explorer), compilers, linkers are applications.

Processes

A successful invocation of a program results in a process. The invocation is typically done either via a shell (cmd, PowerShell or bash) or a menu system (which is a "graphical" shell). Internally, the shells make a system call (built into the OS) that accomplishes this. More technically, in response to an exec system call, the OS loads the program into main memory, constructs certain OS-internal data structures. The resulting entity is called a process. A process is a dynamic (i.e., changing/active) entity.

The word "load" is a highly technical term and, at the level of CEG 2350, a difficult one to describe. Often students are confuse linking with loading. Adding to this confusion are the terms static and ynamic prefixed to both. Static linking is a compile-time activity; per program, we need only do this once. When a program is invoked, static loading brings the entire program into memory before the resulting process begins its execution. In dynamic loading, only portions are brought in as needed, and some portions of the program may never be brought in. Static loading is an activity that happens just before running. Dynamic linking links all the object code files together into a program but postpones linking the methods of the libraries; such a program file is considerably smaller than an equivalent statically linked program. Dynamic linking and loading happens during running of the process.

The loader program/process is often invisible to the users. In Linux, the programs named ld.so and ld-linux accomplish the dynamic linking and loading as part of the exec system call.

The main(argc, argv, envp) method of a freshly created process is supplied three arguments by the invoker process. The argv is a vector of pointers to strings, argc a count of items in argv[], and envp a pointer a an array of characters known as the environment. A shell (a CLI, or GUI shell such as explorer) facilitates the construction of these arguments from keyboard/ mouse/ user given input. The environment is the set of string variables available to all processes. In Linux, env command displays the environment and the set command manipulates it.

Since all programs can access the environment string, it's frequently used as a way to supply options to commands without repeating them every time the command is invoked. (ls reads LS_OPTIONS, for example).

Other examples of values commonly stored in the environment are:

It is a Linux/Windows convention that all global environment variable names be upper-case.

In bash and in powershell, environment variables may be manipulated just like any other shell variable. In bash, e.g., PATH=$PATH:~/bin appends the user's own bin directory to the path.

Process Management

A primary fucntion of any OS is: Given a program, create a process and run it. Both OS Concepts and Usage run many processes "simultaneouly" and strive to guarantee that no process interferes with another, that each is given a fair share of resources, and given the hardware, the overall performance is maximized.

Process States

Lecture Outline: Process states: Read-to-run, Running, Waiting for an event, swapped out. State transitions occur as a result of process scheduling by the OS. Preemption. Priorities. See the Required Reading. from http://en.wikipedia.org/wiki/Process_states

Resources Used by Processes

Every process consumes some resources. The most obvious ones are CPU time, memory, open files, and devices. No process is able to "get" them unless they request the OS. These are granted to the processes as requested/ needed/ available by the OS.

Every process begins with an Open File Table containing three entries in indices 0, 1 and 2. The "stdout" and "stdin" are the normal text input and output of commands, i.e. what shows up in the terminal. C++ programmers can think of them as like "cout" and "cin". There is also a "stderr" for the output of error messages. The shell usually refers to them by number: stdin == 0, stdout == 1, and stderr == 2. The stdin is initially bound to the keyboard; the stdout and stderr are initially bound to the screen. When additional files are opened these are inserted into the Open File Table; as files get closed, these are vacated. So, at any given moment, the Open File Table may not be contiguously filled. There is a limit on the size of this table imposed by the sys admin of the system; typically, it is around 30.

Standard Processes in Linux

The following list was generated by ps aux and then pruned to show only a few of the standard processes. This list does vary from PC to PC depending on the hardware installed and the OS configuration.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1   2952  1852 ?        Ss   Nov23   0:01 /sbin/init
daemon    4118  0.0  0.0   1812   552 ?        Ss   Nov23   0:00 /sbin/portmap
statd     4137  0.0  0.0   1876   716 ?        Ss   Nov23   0:00 /sbin/rpc.statd
root      4407  0.0  0.0   1696   520 tty1     Ss+  Nov23   0:00 /sbin/getty 38400 tty1
root      4914  0.6  6.6  77020 69344 tty7     SLs+ Nov23  45:50 /usr/bin/X
root      4931  0.0  0.0   5280   992 ?        Ss   Nov23   0:00 /usr/sbin/sshd
root      5506  0.0  1.2  31692 13132 ?        S    Nov27   0:00 kded [kdeinit]            
pmateti   5828  0.0  1.2  31868 13024 ?        S    Nov23   1:57 kwin [kdeinit]
pmateti   5830  0.0  1.7  35256 17660 ?        S    Nov23   0:44 kdesktop [kdeinit]                                          
pmateti   5832  0.0  1.8  37200 19476 ?        S    Nov23   1:06 kicker [kdeinit]                                            
pmateti   5863  0.0  0.1   4712  2044 pts/0    Ss   Nov23   0:00 /bin/bash

Standard Processes in Windows

The following list was generated by tasklist and then pruned to show only a few of the standard processes. This list does vary from PC to PC depending on the hardware installed and the OS configuration.


Image Name                   PID Session Name     Session#    Mem Usage
========================= ====== ================ ======== ============
System Idle Process            0 Console                 0         28 K
System                         4 Console                 0        236 K
smss.exe                     812 Console                 0        380 K
csrss.exe                    876 Console                 0      3,460 K
winlogon.exe                 904 Console                 0      6,452 K
services.exe                 948 Console                 0      5,908 K
lsass.exe                    960 Console                 0      1,632 K
ati2evxx.exe                1120 Console                 0      3,344 K
svchost.exe                 1152 Console                 0      5,252 K
spoolsv.exe                 2012 Console                 0      5,356 K
avgamsvr.exe                 312 Console                 0        332 K
MDM.EXE                      436 Console                 0      2,852 K
explorer.exe                1312 Console                 0     29,260 K
alg.exe                     1332 Console                 0      3,552 K
wmiprvse.exe                6488 Console                 0      5,632 K

Process Utilities

The following are standard programs that you are expected to learn as part of this course in the context of processes. For further details on the commands, look it up in the text book, man/help pages, and the web.

Linux Windows Brief description/Limitations in the Learning Objective, if any
ksysguard taskmgr Continuously updated GUI view of processes
ps tasklist Display processes currently alive
top Continuously updated text view of processes
nice Invoke the rest of the command at a lower priority
time Invoke the rest of the command and time it
kill taskkill /pid Kill a process whose number is given.
killall taskkill /im Kill a process whose name is given.
bg Place the last suspended process in the background
fg Place the last suspended process in the foreground
sc Service Controller
ltrace Show library calls being made
strace Show system calls being made

Signals and the Kill Command

Syntax: kill -[SIGNAL] PID...

Despite its name, ending processes is only one function of the kill command. More generally, it sends signals to processes (i.e., it raises exceptions). Programs can either catch these signals and handle them gracefully, or allow the operating system default to handle them.

The default signal sent by kill is SIGTERM. A different signal can be given before the PIDs, either by number (kill -1) or by name (with or without the "SIG", kill -HUP and kill -SIGHUP both work).

Signals are sent for other events besides the user running kill. Many of the most common signals are never sent directly by users except when testing. Bugs in a program may cause it to terminate with SIGSEGV, and pressing control-c usually sends SIGINT, for example.

Unfortunately, signal numbers vary between Unix flavors. The most common signals usually stay the same, but it's a good idea to check kill -l for supported signals. Further, although many systems provide convenience utilities for common tasks, they sometimes have different effects when moving between systems. For example, the command that kills all processes matching a certain name on Linux will end all running processes on Solaris!

Common Signals:

Number Name Meaning
1 SIGHUP "Hang up", causes programs to quit or reload their configuration.
2 SIGINT "Interrupt", like control-c in Bash
4 SIGILL "Illegal instruction", meaning bad assembly code.
9 SIGKILL Cannot be caught and thus causes any process to terminate immediately.
11 SIGSEGV "Segmentation fault", a memory or pointer error.
15 SIGTERM Terminate the process, with whatever graceful shutdown it provides (the default).
13 SIGPIPE Pipe redirection failure.
(Varies) SIGSTOP Suspends the process, like control-z in Bash. (18 on Linux, 23 on Solaris)
(Varies) SIGCONT Continues a suspended process, like fg in Bash. (18 on Linux, 25 on Solaris)

Windows PsTools

The tools needed for Linux are readily present in a typical Linux distribution, but the tools needed for Windows (known as PsTools) need to be downloaded from http://technet.microsoft.com/en-us/sysinternals/ None of the tools requires any special installation. The tools included in the PsTools suite are:

PsExecexecute processes remotely
PsFileshows files opened remotely
PsGetSiddisplay the SID of a computer or a user
PsInfolist information about a system
PsKillkill processes by name or process ID
PsListlist detailed information about processes
PsLoggedOnsee who's logged on locally and via resource sharing
PsLogListdump event log records
PsPasswdchanges account passwords
PsServiceview and control services
PsShutdownshuts down and optionally reboots a computer
PsSuspendsuspends processes

(The author of the above alerts us that some anti-virus scanners may report that one or more of the tools are infected with a "remote admin" virus. None of the PsTools contain viruses, but they have been used by viruses, which is why they trigger virus notifications.) See the References below.


Copyright © 2012 Prabhaker Mateti