I sort of fixed the bug. I had just tiny misunderstandings about the system that was pivotal.
Turns out, the system call table was completely wrong.
#define __SYSCALL(nr, sym) __x64_##sym,
As you can see, the __SYSCALL macro does nothing about the number of system call. What it does is only just converting the symbol name. So, if I were to write this :
sys_call_ptr_t sys_call_table_myelf[] = {
__SYSCALL(69, sys_write)
__SYSCALL(60, sys_exit)
};
This is just equivalent to :
sys_call_ptr_t sys_call_table_myelf[] = {
__x64_sys_write ,
__x64_sys_exit ,
};
...which means that system call #0 is sys_write, and system call #1 is sys_exit. I thought that the macro can just automatically designate the system call number to the array index, but turns out it wasn't. I should've manually designate all the system call handlers to the array.
sys_call_ptr_t sys_call_table_myelf[] = {
__SYSCALL(0, sys_ni_syscall)
__SYSCALL(1, sys_ni_syscall)
__SYSCALL(2, sys_ni_syscall)
__SYSCALL(3, sys_ni_syscall)
... 60 times ...
__SYSCALL(60, sys_exit)
__SYSCALL(61, sys_ni_syscall)
... 9 times ...
__SYSCALL(69, sys_write)
__SYSCALL(70, sys_ni_syscall)
__SYSCALL(71, sys_ni_syscall)
__SYSCALL(62, sys_ni_syscall)
...
__SYSCALL(548, __sys_get_current_syscall_tbl)
};
Above is the proper system call table that wires #69 system call to sys_write and #60 system call to sys_exit.
In x86_64 architecture, the trap_init function does the registration of system call handler. trap_init() function calls the cpu_init() function, and the cpu_init() function calls the syscall_init() function.
arch/x86/kernel/traps.c :
...
void __init trap_init(void)
{
/* Init cpu_entry_area before IST entries are set up */
setup_cpu_entry_areas();
/* Init GHCB memory pages when running as an SEV-ES guest */
sev_es_init_vc_handling();
/* Initialize TSS before setting up traps so ISTs work */
cpu_init_exception_handling();
/* Setup traps as cpu_init() might #GP */
idt_setup_traps();
cpu_init();
}
arch/x86/kernel/cpu/common.c :
...
void cpu_init(void)
{
struct task_struct *cur = current;
int cpu = raw_smp_processor_id();
...
if (IS_ENABLED(CONFIG_X86_64)) {
loadsegment(fs, 0);
memset(cur->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
syscall_init();
...
}
...
If we look at syscall_init() function, we can see the system call entry function.
...
/* May not be marked __init: used by software suspend */
void syscall_init(void)
{
wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS);
wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64);
...
}
If we look at entry_SYSCALL_64 at x86/entry/entry_64.S, we can see that it calls do_syscall_64() function. (finally..)
...
SYM_CODE_START(entry_SYSCALL_64)
UNWIND_HINT_ENTRY
ENDBR
swapgs
/* tss.sp2 is scratch space. */
movq %rsp, PER_CPU_VAR(cpu_tss_rw + TSS_sp2)
SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp
movq PER_CPU_VAR(pcpu_hot + X86_top_of_stack), %rsp
...
call do_syscall_64 /* returns with IRQs disabled */
...
Now finally, do_syscall_64() function calls do_syscall_x64() function, and do_syscall_x64() function calls the actual system call handler from the global system call table(sys_call_table).
static __always_inline bool do_syscall_x64(struct pt_regs *regs, int nr)
{
/*
* Convert negative numbers to very high and thus out of range
* numbers for comparisons.
*/
unsigned int unr = nr;
if (likely(unr < NR_syscalls)) {
unr = array_index_nospec(unr, NR_syscalls);
regs->ax = sys_call_table[unr](regs);
return true;
}
return false;
}
So.. to sum up the system call process..
Initialization :
System call handler :
sys_call_table = (sys_call_ptr_t *)current->sys_call_table;
regs->ax = sys_call_table[unr](regs);
Now, if we change the system call table to the current task's system call table at the instant before the handler is called, we finally get the result that we've been waiting for.
As you can see in the above result, the system call table is properly changed according to the binary format, and the system call number #69 is wired to the sys_write() function.
The obvious problem here is the architectural dependency. We need a system that can be applied without architecture-dependent modification. Here we had to modify the architecture-dependent code to make this work, but we need more liable solution that does not require the modification of architecture-dependent code. I hope that there is some code(or function) in a part of kernel that automatically is called whenever the system call occurs..
Different executable files from different OS calls system call with their individual argument convention. Windows and Linux may have different argument, such as the register for system call number different for two platform. But because the argument the kernel passes to the system call handler is the register, I do not think it would be a major issue. Calling Method also would be an easy-fix, because we just need to add the calling methods of other platforms to the syscall_init() function.
The real problem here is the problem of OS's complex library and other "services." Operating Systems have so much differences and variabilitie that it would be considered pretty much impossible to integrate those differences into one "interface." Windows, for example, has something called "DLL File" that is loaded during the runtime. We have to somehow implement all sorts of library and other arduous shenanigans that will definitely take about ten years to make.
For now, I should focus only on the system call and some other essential/base features for future extension of this idea. I will make a system that can flexibly change the system call routine without too much architecture-dependent modification.
So much things to do but I'm always procrastinating..