UP | HOME
../../ | NoSlides

read() SysCall Internals

1 An Example: read(fd, buf, count)

  1. Shown below is the (conceptual version of the) Linux code for the syscall read(fd, buf, count)
511 SYSCALL_read(unsigned int fd, char __user * buf, size_t count)
512 {
513         struct fd f = fdget_pos(fd);
514         ssize_t ret = -EBADF;
515 
516         if (f.file) {
517                 loff_t pos = file_pos_read(f.file);
518                 ret = vfs_read(f.file, buf, count, &pos);
519                 if (ret >= 0)
520                         file_pos_write(f.file, pos);
521                 fdput_pos(f);
522         }
523         return ret;
524 }

The return value is stored in ret pessimistically initialized to -EBADF. [Read man errno .] If the file descriptor fd is good, get the number of bytes read so far, and ask vfs_read to get from that position on count bytes, deposit them in the buffer buf, and set ret to the return value given by vfs_read. Note the use of & in &pos; the value of pos may/will get updated. The fdput_pos(f) updates the number of bytes read so far. Return the value of ret.

  1. The bulk of the work is done by vfs_read(), which is a kernel internal ordinary routine, not a system call.

In Linux, the read system call can influence the write system call, and vice versa. In the above file_pos_write(f.file, pos) updates the position where the write syscall would write in this f.file.

Note that we could not check the size of the buffer buf. If this user supplied buffer is shorter than count bytes, user process will crash with a segmentation error.

1.1 C Language Macros

The Linux kernel source defines a macro called SYSCALL_DEFINEx in a file named syscalls.h located within linux/. There are SYSCALL_DEFINEx for x = 0 to 6; the x is the number of arguments that the call takes.

E.g., the standard read system call is defined as follows:

SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
  ... code elided ... 
}

Note that type decalarations precede the name of a variable. The syscall read takes three arguments, namely: fd, the file descriptor, buf, an array of bytes where the bytes read from the file will be placed, and count, the number of bytes to read from the file.

(Run find / -name syscalls.h to find the full pathname(s) of this file. For a fster result, replace the / with the pathname of the dir where you untarred the Linux src code tree.)

1.2 SYSCALL_DEFINE3(read, ...)

In the actual Linux source code, line 511 above is replaced with the following. Why? Because of the need to collect syscall info into a kernel-wide table, etc.

511 SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf,size_t,count)

The SYSCALL_DEFINE3 is a C-preprocessor (cpp) #define macro. It expands to the following plain C function definitions below. The macro expands this definition into several more lines that help with the kernel build.

The macro SYSCALL_DEFINE3 cooked up three function names: sys_read, SYSC_read, SyS_read (note the lc/UC details)

The first asmlinkage long sys_read(...); is a forward declaration, and the second one is the definition of SyS_read, which uses the SYSC_read static (i.e., visible only within this file) function. What is asmlinkage? See below.

asmlinkage long sys_read(unsigned int fd, char __user * buf, size_t count)
 __attribute__((alias(__stringify(SyS_read))));

static inline long SYSC_read(unsigned int fd, char __user * buf, size_t count);
asmlinkage long SyS_read(long int fd, long int buf, long int count);

asmlinkage long SyS_read(long int fd, long int buf, long int count)
{
 long ret = SYSC_read((unsigned int) fd, (char __user *) buf, (size_t) count);
 asmlinkage_protect(3, ret, fd, buf, count);
 return ret;
}

static inline long SYSC_read(unsigned int fd, char __user * buf, size_t count)
{
 struct fd f = fdget_pos(fd);
 ssize_t ret = -EBADF;
 ... code elided ... 
}

1.3 asmlinkage

  1. #define asmlinkage CPP_ASMLINKAGE __attribute__((regparm(0)))
  2. asmlinkage informs the compiler that function arguments are on the CPU stack, instead of registers.
  3. A system call does a table lookup using its first argument, the system call number, and up to four more remaining arguments are passed along to the actual system call procedure.
  4. system call achieves this feat simply by leaving its other arguments (which were passed to it in registers) on the stack.

2 References

3 End


Copyright © 2018 www.wright.edu/~pmateti • 2018-11-05