SysCall Implementation Details
Table of Contents
1 Overview
This page describes the internals of creating an LKM and adding a new system call. Parent of these notes: ../../SysCalls/
2 Discovering the Sys Call Table
more /proc/kallsyms
kallsyms = all kernel symbols for debugging. Readman nm
for the meaning ofA T t
etc. Read also https://jrgraphix.net/man/K/kallsyms.sys_call_table
initialization. Discover the latest. https://elixir.bootlin.com/linux/latest/ident/kallsyms_lookup_name#define SYSCALLTBL319020 0xffffffff81801680 /* 3.19.0-20-lowlatency */ #define SYSCALLTBL401000 0xffffffff81801400 /* 4.0.1 */ #define SYSCALLTBL418010 0xffffffff81e001c0 /* 4.18.0-10-generic */ #define SYSCALLTBL418011 0xffffffff844001c0 /* 4.18.0-11-generic */ #define SYSCALLTBL SYSCALLTBL418011 void * * sys_call_table = (void *) SYSCALLTBL; /* pmateti: poorly declared! */
Discovering the address of the
sys_call_table
The following was done on a Ubuntu 18.10 64-bit system. Dynamic lookup.# grep sys_call_table /proc/kallsyms ffffffff9e0001c0 R sys_call_table ffffffff9e0015a0 R ia32_sys_call_table
Can also find in
/boot/System.map-*
Static lookup. Note the different values.root@Milner:~# grep sys_call_table /boot/System.map-4.18.0-10-generic ffffffff81e001c0 R sys_call_table ffffffff81e015a0 R ia32_sys_call_table
uname -a
; uname -a Linux sutherland 5.4.0-050400rc4-generic ...
sudo or not??
; grep sys_call_table /proc/kallsyms 0000000000000000 D x32_sys_call_table 0000000000000000 D sys_call_table 0000000000000000 D ia32_sys_call_table
sudo or not??
; sudo grep sys_call_table /proc/kallsyms [sudo] password for pmateti: ffffffff9a000260 D x32_sys_call_table ffffffff9a001380 D sys_call_table ffffffff9a0023c0 D ia32_sys_call_table
3 A "Not Implemented" SysCall
sys_ni_syscall
A "not implemented" syscall# grep sys_ni_syscall /proc/kallsyms ffffffff9d2b4270 T sys_ni_syscall
.../kernel/sys_ni.c
https://elixir.bootlin.com/linux/latest/ident/sys_ni_syscallasmlinkage long sys_ni_syscall(void) { return -ENOSYS; }
4 RW of SysCall Table
- For improved security, the syscall table is set to Read-Only. The table is constructed at build/ compile time.
- If you wish to change the table during run time, set the mode to RW.
A portion of the ./sysredirect.c that is our example LKM is shown below:
void * * sys_call_table = (void *) SYSCALLTBL; /* needs improvement */ ... static void * syscallredirect(int nrdel, int nradd) { unsigned int unused = 0; pte_t * pte = lookup_address((long) sys_call_table, & unused); void * oldptr = sys_call_table[nrdel]; pte->pte |= _PAGE_RW; /* set page to RW */ sys_call_table[nrdel] = sys_call_table[nradd]; pte->pte &= ~ _PAGE_RW; /* set page back to RO */ return oldptr; }
pte_t
is page-table-entry type. Instead ofpte->pte
OR-ed with bit mask_PAGE_RW
, usingset_memory_rw()
(search at https://elixir.bootlin.com/linux/latest/ident/ ) keeps the relevant abstraction visible.
5 Adding New SysCalls
- Get an overview of an existing system call.
https://elixir.bootlin.com/linux/latest/source/fs/open.c Initially,
skim the lines that have
SYSCALL_DEFINE
- Write the code for a/ the new system call. [Place it in the kernel/ subtree. Preferably.]
- The code for an LKM can be located outside the Linux kernel tree. But adds complexity in the build.
- Add the pointer to this function into the sys- call- table.
- How to build an LKM. ../../BuildKernel
- How to insert/ remove an LKM.
man insmod
- Testing system calls.
- Update the ./Makefile The
make
tool uses features of the kernel's Makefile in building the LKM. Understand the-C
flag. The end result will be a.ko
module.
6 Src Code Files of LKM ../sysCallRedir/
-rw-r--r-- 1 pmateti 202 Nov 21 2018
./Makefile-rw-r--r-- 1 pmateti 85 Nov 21 2018
./modules.order-rw-r--r-- 1 pmateti 0 Nov 21 2018
./Module.symvers-rw-r--r-- 1 pmateti 3617 Nov 21 2018
./sysredirect.c-rw-rw-r-- 1 pmateti 4424 Nov 21 2018
./sysredirect.ko-rw-r--r-- 1 pmateti 596 Nov 21 2018
./sysredirect.mod.c- There is a good amount of "kernel development magic" in the source code. Do not get discouraged.
7 Proper/ Bad C Usage
- What is the proper declaration of sys-call-table?
void * * sys_call_table
works expediently, but not "correct". Exercise! The following was found in a blog.
struct linux_dirent64 *cur = dirp; ... int reclen = cur->d_reclen; char *next_rec = (char *)cur + reclen; int len = (int)dirp + rtn - (int)next_rec;
- Bad C usage. Reckless int-long-ptr synonyms. The type
int
is compiler specific.sizeof(int)
is typically 4 or 8 bytes. On very old systems, and even modern embedded systems, it can be 2 bytes. The width of a ptr is architecture specific. A ptr value on a 64-bit system is 8 bytes wide. The GNU C compiler indeed generated a warning about(int) dirp
. - C ptr arithmetic. The resulting value of
p + x
is computed as the equivalentlong long int
value of ptrp
, plus (ordinary arith)x * sizeof(p)
asmlinkage int (*ogetdents64) (unsigned int fd, struct linux_dirent64 *dirp, unsigned int count);
should not haveasmlinkage
And, this is declaring a ptr variable namedogetdents64
. It also declares that (i) the ptr it holds is the address of a function, (ii) this function takes three arguments as declared.- I {pmateti} was expecting the GNU C compiler to produce a
warning/error on
ogetdents64(...)
versus(*ogetdents64)(...)
But, it did not! TBD Further investigation is warranted. - What is the proper declaration of sys-call-table?
void * * sys_call_table
works expediently, but not "correct". Why not?? - Do not leave the sys-call-table writeable until the module exits. Not good. As soon as the redirect/hijack is made, bring it to read-only.
sys_call_table
initialization
#define SYSCALLTBLPM 0xffffffff81801680 /* pmateti 3.19.0-20-lowlatency */ #define SYSCALLTBLAS 0xffffffff81801400 /* asish 4.0.1 */ #define __NR_ni 7 /* NR of sys_ni_syscall */
8 References
- Prabhaker Mateti, Intricacies of the C Language, 2019. Required Reading.