mov eax , cr0
or eax , 0x01
mov cr0 , eax



back to months list

Project : Research on Multi-platform System Call Table

Journal Entry Date : 2024.06.03

I sort of fixed the bug. I had just tiny misunderstandings about the system that was pivotal.

1. System Call Table Array

Turns out, the system call table was completely wrong.

#define __SYSCALL(nr, sym) __x64_##sym,

As you can see, the __SYSCALL macro does nothing about the number of system call. What it does is only just converting the symbol name. So, if I were to write this :

sys_call_ptr_t sys_call_table_myelf[] = {
__SYSCALL(69, sys_write)
__SYSCALL(60, sys_exit)
};

This is just equivalent to :

sys_call_ptr_t sys_call_table_myelf[] = {
__x64_sys_write , 
__x64_sys_exit , 
};

...which means that system call #0 is sys_write, and system call #1 is sys_exit. I thought that the macro can just automatically designate the system call number to the array index, but turns out it wasn't. I should've manually designate all the system call handlers to the array.

sys_call_ptr_t sys_call_table_myelf[] = {
__SYSCALL(0, sys_ni_syscall)
__SYSCALL(1, sys_ni_syscall)
__SYSCALL(2, sys_ni_syscall)
__SYSCALL(3, sys_ni_syscall)
... 60 times ...
__SYSCALL(60, sys_exit)
__SYSCALL(61, sys_ni_syscall)
... 9 times  ...
__SYSCALL(69, sys_write)
__SYSCALL(70, sys_ni_syscall)
__SYSCALL(71, sys_ni_syscall)
__SYSCALL(62, sys_ni_syscall)
... 
__SYSCALL(548, __sys_get_current_syscall_tbl)
};

Above is the proper system call table that wires #69 system call to sys_write and #60 system call to sys_exit.

2. Switching the System Call Table

In x86_64 architecture, the trap_init function does the registration of system call handler. trap_init() function calls the cpu_init() function, and the cpu_init() function calls the syscall_init() function.

arch/x86/kernel/traps.c :

...
void __init trap_init(void)
{
	/* Init cpu_entry_area before IST entries are set up */
	setup_cpu_entry_areas();

	/* Init GHCB memory pages when running as an SEV-ES guest */
	sev_es_init_vc_handling();

	/* Initialize TSS before setting up traps so ISTs work */
	cpu_init_exception_handling();
	/* Setup traps as cpu_init() might #GP */
	idt_setup_traps();
	cpu_init();
}

arch/x86/kernel/cpu/common.c :

...
void cpu_init(void)
{
	struct task_struct *cur = current;
	int cpu = raw_smp_processor_id();
    ...

	if (IS_ENABLED(CONFIG_X86_64)) {
		loadsegment(fs, 0);
		memset(cur->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
		syscall_init();
        ...

	}
    ...

If we look at syscall_init() function, we can see the system call entry function.

...
/* May not be marked __init: used by software suspend */
void syscall_init(void)
{
	wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
	wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
     ...
}

If we look at entry_SYSCALL_64 at x86/entry/entry_64.S, we can see that it calls do_syscall_64() function. (finally..)

...
SYM_CODE_START(entry_SYSCALL_64)
	UNWIND_HINT_ENTRY
	ENDBR

	swapgs
	/* tss.sp2 is scratch space. */
	movq	%rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
	SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
	movq	PER_CPU_VAR(pcpu_hot + X86_top_of_stack), %rsp
    
    ...

	call	do_syscall_64		/* returns with IRQs disabled */
    ...

Now finally, do_syscall_64() function calls do_syscall_x64() function, and do_syscall_x64() function calls the actual system call handler from the global system call table(sys_call_table).


static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
{
	/*
	 * Convert negative numbers to very high and thus out of range
	 * numbers for comparisons.
	 */
	unsigned int unr = nr;

	if (likely(unr < NR_syscalls)) {
		unr = array_index_nospec(unr, NR_syscalls);
		regs->ax = sys_call_table[unr](regs);
		return true;
	}
	return false;
}

So.. to sum up the system call process..

Initialization :

  1. 1. At the initialization of kernel, the trap_init() function calls cpu_init()
  2. cpu_init() calls syscall_init()
  3. syscall_init() registers entry_SYSCALL_64() function as system call handler(MSR_LSTAR)

System call handler :

  1. User makes the system call, and thus entry_SYSCALL_64() is called.
  2. entry_SYSCALL_64() calls do_syscall_64() function
  3. do_syscall_64() calls do_syscall_x64()
  4. Do_syscall_x64() calls the system call handling function from the system call table
sys_call_table = (sys_call_ptr_t *)current->sys_call_table;
regs->ax = sys_call_table[unr](regs);

Now, if we change the system call table to the current task's system call table at the instant before the handler is called, we finally get the result that we've been waiting for.

As you can see in the above result, the system call table is properly changed according to the binary format, and the system call number #69 is wired to the sys_write() function.

3. Problems & Next Goals

A. System Call Switching

The obvious problem here is the architectural dependency. We need a system that can be applied without architecture-dependent modification. Here we had to modify the architecture-dependent code to make this work, but we need more liable solution that does not require the modification of architecture-dependent code. I hope that there is some code(or function) in a part of kernel that automatically is called whenever the system call occurs..

B. Register & Calling Method (May not be a problem)

Different executable files from different OS calls system call with their individual argument convention. Windows and Linux may have different argument, such as the register for system call number different for two platform. But because the argument the kernel passes to the system call handler is the register, I do not think it would be a major issue. Calling Method also would be an easy-fix, because we just need to add the calling methods of other platforms to the syscall_init() function.

C. Real Problem

The real problem here is the problem of OS's complex library and other "services." Operating Systems have so much differences and variabilitie that it would be considered pretty much impossible to integrate those differences into one "interface." Windows, for example, has something called "DLL File" that is loaded during the runtime. We have to somehow implement all sorts of library and other arduous shenanigans that will definitely take about ten years to make.

For now, I should focus only on the system call and some other essential/base features for future extension of this idea. I will make a system that can flexibly change the system call routine without too much architecture-dependent modification.

So much things to do but I'm always procrastinating..