The word "program" is often used loosely, and interchangeably with the word "process". But we should be very careful in the use of this word. It is silly to say that "a program is running." It should be "a process is running." This article and the associated lab experiment serve as a technical introduction at the freshman level to programs and processes. It introduces the control of processes as in stopping, resuming, changing priorities, and explains the resources (such as CPU time, etc.) consumed.
A program is a static (i.e., unchanging/passive) entity. It is a file whose content is rigidly formulated as needed by the operating system. For each OS, there are several such rigid formats. In Linux, ELF is the most common, and there are other formats. In Windows, the EXE format is the most common, and the obsolete COM format is till in use.
Programs are divided into two classes for the purposes of this course: Applications and Systems Programs. Programs such as word processors, email clients, web browsers are applications. Programs such as init (that controls the sessions of an OS), the loader that load a program into memory as a required prelude to making the program into a process, ifconfig/ ipconfig (that set the parameters of network cards) are systems programs. Programs whose absence would make an OS incomplete/ crippled are called systems programs. Programs that make a computer system useful in a particular way are applications. This definition has been evolving over the decades. E.g., compilers, linkers, and shells used be considered systems programs.
We write the source code of programs. A program may also have help files, documentation, and other such files. These are not essential in that their absence will not disable the launching of a program. When these files are asked for, you will only get a "missing file" error.
The source code is a file of text that must abide by the syntax and semantics of some programming language. Some well known programming languages are C++, Java, Perl, Python, and Assembly. For reasons of modularity and manageability, the source code is often split into multiple files.
Source code files are processed by programs called compilers, interpreters, and assemblers. After compilation of the source code, object code files are produced. The content of the object files is rigidly controlled. It is often the case that source code files written in different programming langauges are compiled into object code files that can be linked together.
In Linux, object code files have .o extension; in Windows, the extension for object code files is .obj.
Java files typically get compiled into the byte code of JVM, which is platform (i.e., CPU and OS) independent; the extension for these byte code files is .class. There are regular compilers also that compile Java straight into the machine code of a specific CPU.
Integrated development environments (IDE) are the primary tools for developing programs. Behind the scences, they compile, link, and manage the entire development activity. In this course, we are trying to understand these activities. In Linux, the command line tools with the names gcc or g++ are driver programs that examine the arguments in a sophisticated way given and invoke appropriate tools (such as compilers, assemblers, and linkers) based on the arguments.The object code files and methods/ procedures/ functions from pre-existing library files are linked into an executable file that is then qualified to be called a program. In Linux, programs (traditionally) do not have any extension. In Windows, program files have .exe extension. Files with .com extension are old format program files dating from MS DOS.
The structure and content of an object code file obeys rigid rules. Conceptually, we can think of each file beginning with a TOC (table of contents like in a book), followed by the executable machine code of the various methods. The TOC describes among aother things imported and exported symbols (i.e., names of variables, methods, etc). A given object file may use names that are defined elsewhere; these are imported sysmbols. A given object code file may define some symbols that may or may not be used within that file, but are intended to be of use elsewhere. These are exported symbols.
A linker (also called linkage editor) essentially "stitches" the object code files together replacing all references of imported symbols with their addresses defined in the exports list. This stitching succeeds only when all the imported symbols across all object code files, that make up one program, are found among the exported symbols (including those exported by various libraries).
In Linux, the linker is actually named ld for historical reasons. It has nothing to do with the loading activity described below.
Certain methods are so common and so useful that over the decades the code for these has been developed carefully and optimized into collections known as libraries. A library can be viewed as a catenation of object code files with a TOC up front.
In Linux, library files have names ending with the extension .so and in Windows .dll. These are essential in that the absence of any such file will cause the launch of a program to fail.
Programs are typically dynamically linked with the many widely-known libaries. The command ldd displays the list of such libraries.
% ldd /bin/ls linux-vdso.so.1 => (0x00007fffcdfff000) libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007fdcbca3e000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fdcbc836000) libacl.so.1 => /lib/x86_64-linux-gnu/libacl.so.1 (0x00007fdcbc62d000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdcbc26e000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdcbc06a000) /lib64/ld-linux-x86-64.so.2 (0x00007fdcbcc93000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fdcbbe4c000) libattr.so.1 => /lib/x86_64-linux-gnu/libattr.so.1 (0x00007fdcbbc47000)
The program file /bin/ls does not duplicate the code of the methods it uses from these libaries. These are linked at run-time as needed in the process image of the program. The content of /bin/ls remains as-was. It is possible to statically link every thing by changing relevant settings in the IDE or linker. Obviously, a statically linked program file will be much larger.
A typical example of system calls are following four on files: open a file, read the next several bytes of a file, write the next several bytes of a file, close the file.
Both OS have some 300+ system calls. These can be invoked directly, but more commonly the system calls are wrapped inside more convenient library methods.
Consider the file /bin/ls. This is a program in the Linux OS. Suppose we copy this file over to Windows -- no changes, byte-for-byte an exact copy. Windows refuses to recognize this as a program. Obviously a program contains the machine code specific to a CPU. So if we move a program from an Intel CPU machine to an ARM CPU machine, it cannot be invoked.
A program is a file with a rigorous structure at the level of byte sequences. Standards such as ELF and COFF describe this structure. Humans do not construct programs "by hand" any more -- Linkers do. class of programs
In recent years, virtual machines, and emulators of immense capabilities have appeared. E.g., It is possible to a run a Linux program, as-is, on Windows and vice-versa. Examples of such software: virtualbox, wine, qemu, crossover.
What does installing a program do? install, *.deb, *.rpm. *.msi, installable *.exe files.
Programs are installed into specific directories. An installation package is a "bundle" of files. In Linux, the "tar ball" is used. In Windows, a zip, an .msi, or even a specially constructed .exe is used. Installation is performed by invoking a special program (for now, let us call it the installer). This involves unbundling the files, and moving the files into the destination directories. Before such an installation, the installer checks the veracity of the package. The details of installer will be discussed further under the heading of Sys Admin.
The standard directories are shown below. The rows are roughly aligned based on their functionality. The /usr/bin and C:\Program Files directories contain applications. The /sbin, /usr/sbin, C:\Windows\system32 contain system programs.
On Linux
|
. |
On Windows
|
On Windows, C: is used as an example only; do echo %SystemDrive% in cmd or powershell to see the actual drive name on the PC your are working on.
The following are standard programs that you are expected to learn as part of this course in the context of programs. For further details on the commands, look it up in the text book, man/help pages, and the web.
Linux | Windows | Brief description | |
file | Heuristically determine what kind of a file the given one is | ||
size | Display sizes of code, bss, and data of a program | ||
ldd | tasklist /m | Display the libraries needed to invoke a program | |
env | set | Display the Environment of the invoking shell | |
install | Install a program | ||
nm | Display the names of variables, methods, etc defined in a program or object code file | ||
strip | Strip the above names etc. |
In tables such as the above, some entries are/will-be blank. This does not mean that the OS cannot do the equivalent. It simply means that the standard installation did/does not come with an equivalent program. There is a well-known subsystem http://www.cygwin.com/ that runs under Windows providing most of Linux functionality, using the program names of Linux. On our lab machines, cygwin is installed, but these are clearly not Winodws-native.
Linux includes many utilities for extracting information about the contents of files. Two of the most important are file and size.
file FILENAME... will output the type of the given files, such as "ASCII text" or "MP3 file with ID3 version 2.3.0 tag". It does this by examining certain distinctive patterns of bytes within a file (called the type's magic number), and can often get quite detailed information.
size outputs information about object files or compiled executables, such as those produced by GCC. Specifically, it lists the sizes of the various sections of the object file. The "text" section is the code, "data" contains data which is initialized, and "bss" is the uninitialized part of the data segment. (Recall the difference between initializing a variable and merely declaring it from CS240.)
System Programs are programs that are essential to the OS. Their absence will cause significant loss of OS functionality. E.g., the "login" program/process takes the userid and password of a user and instantiates a working session for the user. The "loader" program/process loads the program file into memory in preparation to giving birth to the corresponding process. The "ifconfig" assigns an IP address to a/the network card. The "mount" makes file volumes accessible. In Linux, system programs are located in /sbin, and /usr/sbin. In Windows, system programs are located in C:\Windows, and C:\Windows\system32.
Unfortunately some non-systems programs slip into these directories. Whereas C:\Windows\regedit.exe is a system program, e.g., C:\Windows\write.exe and C:\Windows\System32\notepad.exe are rather trivial applications.
Over this decade, some programs that used to be considered system programs are now viewed as applications. Shells (graphical or not, such as bash, explorer), compilers, linkers are applications.
A successful invocation of a program results in a process. The invocation is typically done either via a shell (cmd, PowerShell or bash) or a menu system (which is a "graphical" shell). Internally, the shells make a system call (built into the OS) that accomplishes this. More technically, in response to an exec system call, the OS loads the program into main memory, constructs certain OS-internal data structures. The resulting entity is called a process. A process is a dynamic (i.e., changing/active) entity.
The word "load" is a highly technical term and, at the level of CEG 2350, a difficult one to describe. Often students are confuse linking with loading. Adding to this confusion are the terms static and ynamic prefixed to both. Static linking is a compile-time activity; per program, we need only do this once. When a program is invoked, static loading brings the entire program into memory before the resulting process begins its execution. In dynamic loading, only portions are brought in as needed, and some portions of the program may never be brought in. Static loading is an activity that happens just before running. Dynamic linking links all the object code files together into a program but postpones linking the methods of the libraries; such a program file is considerably smaller than an equivalent statically linked program. Dynamic linking and loading happens during running of the process.
The loader program/process is often invisible to the users. In Linux, the programs named ld.so and ld-linux accomplish the dynamic linking and loading as part of the exec system call.
The main(argc, argv, envp) method of a freshly created process is supplied three arguments by the invoker process. The argv is a vector of pointers to strings, argc a count of items in argv[], and envp a pointer a an array of characters known as the environment. A shell (a CLI, or GUI shell such as explorer) facilitates the construction of these arguments from keyboard/ mouse/ user given input. The environment is the set of string variables available to all processes. In Linux, env command displays the environment and the set command manipulates it.
Since all programs can access the environment string, it's frequently used as a way to supply options to commands without repeating them every time the command is invoked. (ls reads LS_OPTIONS, for example).
Other examples of values commonly stored in the environment are:
It is a Linux/Windows convention that all global environment variable names be upper-case.
In bash and in powershell, environment variables may be manipulated just like any other shell variable. In bash, e.g., PATH=$PATH:~/bin appends the user's own bin directory to the path.
Lecture Outline: Process states: Read-to-run, Running, Waiting for an event, swapped out. State transitions occur as a result of process scheduling by the OS. Preemption. Priorities. See the Required Reading. from http://en.wikipedia.org/wiki/Process_states
Every process consumes some resources. The most obvious ones are CPU time, memory, open files, and devices. No process is able to "get" them unless they request the OS. These are granted to the processes as requested/ needed/ available by the OS.
Every process begins with an Open File Table containing three entries in indices 0, 1 and 2. The "stdout" and "stdin" are the normal text input and output of commands, i.e. what shows up in the terminal. C++ programmers can think of them as like "cout" and "cin". There is also a "stderr" for the output of error messages. The shell usually refers to them by number: stdin == 0, stdout == 1, and stderr == 2. The stdin is initially bound to the keyboard; the stdout and stderr are initially bound to the screen. When additional files are opened these are inserted into the Open File Table; as files get closed, these are vacated. So, at any given moment, the Open File Table may not be contiguously filled. There is a limit on the size of this table imposed by the sys admin of the system; typically, it is around 30.
The following list was generated by ps aux and then pruned to show only a few of the standard processes. This list does vary from PC to PC depending on the hardware installed and the OS configuration.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.1 2952 1852 ? Ss Nov23 0:01 /sbin/init daemon 4118 0.0 0.0 1812 552 ? Ss Nov23 0:00 /sbin/portmap statd 4137 0.0 0.0 1876 716 ? Ss Nov23 0:00 /sbin/rpc.statd root 4407 0.0 0.0 1696 520 tty1 Ss+ Nov23 0:00 /sbin/getty 38400 tty1 root 4914 0.6 6.6 77020 69344 tty7 SLs+ Nov23 45:50 /usr/bin/X root 4931 0.0 0.0 5280 992 ? Ss Nov23 0:00 /usr/sbin/sshd root 5506 0.0 1.2 31692 13132 ? S Nov27 0:00 kded [kdeinit] pmateti 5828 0.0 1.2 31868 13024 ? S Nov23 1:57 kwin [kdeinit] pmateti 5830 0.0 1.7 35256 17660 ? S Nov23 0:44 kdesktop [kdeinit] pmateti 5832 0.0 1.8 37200 19476 ? S Nov23 1:06 kicker [kdeinit] pmateti 5863 0.0 0.1 4712 2044 pts/0 Ss Nov23 0:00 /bin/bash
The following list was generated by tasklist and then pruned to show only a few of the standard processes. This list does vary from PC to PC depending on the hardware installed and the OS configuration.
Image Name PID Session Name Session# Mem Usage ========================= ====== ================ ======== ============ System Idle Process 0 Console 0 28 K System 4 Console 0 236 K smss.exe 812 Console 0 380 K csrss.exe 876 Console 0 3,460 K winlogon.exe 904 Console 0 6,452 K services.exe 948 Console 0 5,908 K lsass.exe 960 Console 0 1,632 K ati2evxx.exe 1120 Console 0 3,344 K svchost.exe 1152 Console 0 5,252 K spoolsv.exe 2012 Console 0 5,356 K avgamsvr.exe 312 Console 0 332 K MDM.EXE 436 Console 0 2,852 K explorer.exe 1312 Console 0 29,260 K alg.exe 1332 Console 0 3,552 K wmiprvse.exe 6488 Console 0 5,632 K
The following are standard programs that you are expected to learn as part of this course in the context of processes. For further details on the commands, look it up in the text book, man/help pages, and the web.
Linux | Windows | Brief description/Limitations in the Learning Objective, if any | |
ksysguard | taskmgr | Continuously updated GUI view of processes | |
ps | tasklist | Display processes currently alive | |
top | Continuously updated text view of processes | ||
nice | Invoke the rest of the command at a lower priority | ||
time | Invoke the rest of the command and time it | ||
kill | taskkill /pid | Kill a process whose number is given. | |
killall | taskkill /im | Kill a process whose name is given. | |
bg | Place the last suspended process in the background | ||
fg | Place the last suspended process in the foreground | ||
sc | Service Controller | ||
ltrace | Show library calls being made | ||
strace | Show system calls being made |
Syntax: kill -[SIGNAL] PID...
Despite its name, ending processes is only one function of the kill command. More generally, it sends signals to processes (i.e., it raises exceptions). Programs can either catch these signals and handle them gracefully, or allow the operating system default to handle them.
The default signal sent by kill is SIGTERM. A different signal can be given before the PIDs, either by number (kill -1) or by name (with or without the "SIG", kill -HUP and kill -SIGHUP both work).
Signals are sent for other events besides the user running kill. Many of the most common signals are never sent directly by users except when testing. Bugs in a program may cause it to terminate with SIGSEGV, and pressing control-c usually sends SIGINT, for example.
Unfortunately, signal numbers vary between Unix flavors. The most common signals usually stay the same, but it's a good idea to check kill -l for supported signals. Further, although many systems provide convenience utilities for common tasks, they sometimes have different effects when moving between systems. For example, the command that kills all processes matching a certain name on Linux will end all running processes on Solaris!
Number | Name | Meaning |
1 | SIGHUP | "Hang up", causes programs to quit or reload their configuration. |
2 | SIGINT | "Interrupt", like control-c in Bash |
4 | SIGILL | "Illegal instruction", meaning bad assembly code. |
9 | SIGKILL | Cannot be caught and thus causes any process to terminate immediately. |
11 | SIGSEGV | "Segmentation fault", a memory or pointer error. |
15 | SIGTERM | Terminate the process, with whatever graceful shutdown it provides (the default). |
13 | SIGPIPE | Pipe redirection failure. |
(Varies) | SIGSTOP | Suspends the process, like control-z in Bash. (18 on Linux, 23 on Solaris) |
(Varies) | SIGCONT | Continues a suspended process, like fg in Bash. (18 on Linux, 25 on Solaris) |
The tools needed for Linux are readily present in a typical Linux distribution, but the tools needed for Windows (known as PsTools) need to be downloaded from http://technet.microsoft.com/en-us/sysinternals/ None of the tools requires any special installation. The tools included in the PsTools suite are:
PsExec | execute processes remotely |
PsFile | shows files opened remotely |
PsGetSid | display the SID of a computer or a user |
PsInfo | list information about a system |
PsKill | kill processes by name or process ID |
PsList | list detailed information about processes |
PsLoggedOn | see who's logged on locally and via resource sharing |
PsLogList | dump event log records |
PsPasswd | changes account passwords |
PsService | view and control services |
PsShutdown | shuts down and optionally reboots a computer |
PsSuspend | suspends processes |
(The author of the above alerts us that some anti-virus scanners may report that one or more of the tools are infected with a "remote admin" virus. None of the PsTools contain viruses, but they have been used by viruses, which is why they trigger virus notifications.) See the References below.