1

I'm trying to boot linux on an experimental board using busybox and vanilla linux kernel (5.10.0).
At the last stage of init_kernel, it executes the init script and the last command in the init script is exec /bin/sh. But somehow, the /bin/sh freezes and I can see it is right after the schedule() function is called and retunred. But I don't know where the program has gone after the schedule() function. (I cannot use gdb for the board). So as shown below, I tried putting some prints at the schedule function to read the lr(link register,=x30) that should remain at the bottom of the stack during the schedule function.
(In arm64 architecture, at the entrance of a function, x29(=fp) and x30(=lr) is stored at the bottom of the stack. lr register is the address to return to after the function finishes. see understanding aarch64 assembly function call, how is stack operated?.
The variable passed_it is for limiting the print to the time after invoking init script.)

-- kernel/sched/core.c --

extern int passed_it;
asmlinkage __visible void __sched schedule(void)
{
    struct task_struct *tsk = current;
    register void *sp asm ("sp");

    if (passed_it) printk("@entered schedule\n");
    sched_submit_work(tsk);
    do {
        preempt_disable();
            if (passed_it) printk("@entering __schedule\n");
        __schedule(false);
            if (passed_it) printk("@exited __schedule\n");
        sched_preempt_enable_no_resched();
    } while (need_resched());
    sched_update_worker(tsk);
    if (passed_it) printk("@exiting schedule. sp=%px, fp=%lx, lr=%lx\n",sp,*((long *)sp), *((long *)sp+8));
}

and this is the log from the experiment when it stopped.

This boot took 0.00 seconds

### calling /bin/sh ###
/bin/sh: can't a@entered schedule
@entering __schedule
@exited __schedule
@exiting schedule. sp=ffffffc0106a3f00, fp=ffffffc0106a3f40, lr=ffffffc0106a3f50

I generated vmlinux.objdump by aarch64-none-elf-objdump -S vmlinux > vmlinux.objdump to see what the lr value points to. But it was not in any text section and the System.map shows this 0xffffffc0106a3f50 is just somewhere between __start_init_task(0xffffffc0106a0000, = __init_stack) and __end_init_task(0xffffffc0106a4000). The init stack grows down from __init_stack.

-- System.map --

ffffffc01063c1c0 d cfd_data
ffffffc01063c200 d csd_data
ffffffc01063c220 D __per_cpu_end
ffffffc0106a0000 D __init_end
ffffffc0106a0000 D __initdata_end
ffffffc0106a0000 D __start_init_task
ffffffc0106a0000 D _data
ffffffc0106a0000 D _sdata
ffffffc0106a0000 D init_stack
ffffffc0106a0000 D init_thread_union
ffffffc0106a4000 D __end_init_task
ffffffc0106a4000 D __nosave_begin
ffffffc0106a4000 D __nosave_end
ffffffc0106a4000 d vdso_data_store
ffffffc0106a5000 D boot_args

So why is it pointing to __init_task ? Could any one tell me what I am missing here? On second thought, I guess the virtual address printed in the experiment might be a user space virtual address, not the linux kernel address. Will this be the case?

Chan Kim
  • 5,177
  • 12
  • 57
  • 112
  • I suggest you write something like this [Q/A on thumb functions](https://stackoverflow.com/questions/20369440/can-start-be-the-thumb-function) and [static libc](https://stackoverflow.com/questions/24616226/how-can-i-select-a-static-library-to-be-linked-while-arm-cross-compiling) changed to ARM64. Ie, an executable that only calls 'write()' on stdout to print something. It will show the kernel is working. For your 'init' process, the whole file system needs to be properly structured including `ld` and all the shared libraries. Getting user space working is more complex than it appears. – artless noise Aug 28 '22 at 20:35
  • Linux does 'demand page loading' and it might appear like the process started, but faults, etc can occur as the 'entry' starts by fetching pages from backing store. – artless noise Aug 28 '22 at 20:41
  • oh I made a mistake in the question. The lr was point somewhere betwen __start_init_task and __end_init_task. and I'm sure the user program(busybox /bin/sh) started ok. I could print things in the busybox ash_main function. (I have to do more research for the __start_init_task, __end_init_task) – Chan Kim Aug 29 '22 at 13:29
  • @artlessnoise but the same busybox initramfs.cpio.gz works on another virtual machine (qemu arm64 machien 'virt'). I tried replacing the init script with a program repeatedly doing printf("hello %d\n", cnt++); and it stops after about 190 times. no kernel panic. and I tried replacing the init with a program doing printf("enter a char :\n"); scanf("%c", &c); but when I enter a character it doesnt'respond. I'm looking at why the uart receive interrupt is masked at the time the program was running. If you have any suggestion, I'd appreciated it. (and I'm setting up the debugger also) – Chan Kim Sep 03 '22 at 11:27

1 Answers1

0

The other day I found why it was not working and forgot up update it here.
The reason was simple. My understanding that the lr(=x30) value is stored at the second value from the current stack frame was correct. But the problem was from a simple mistake (pointer operation).
The printk state

if (passed_it) printk("@exiting schedule. sp=%px, fp=%lx, lr=%lx\n",sp,*((long *)sp), *((long *)sp+8));

should have been

if (passed_it) printk("@exiting schedule. sp=%px, fp=%lx, lr=%lx\n",sp,*((long *)sp), *((long *)sp+1));

or

if (passed_it) printk("@exiting schedule. sp=%px, fp=%lx, lr=%lx\n",sp,*((long *)sp), *((long *)(sp+8)));

you know adding 1 to a (long *) means incrementing 8 in the address(very basic).
After fixing this, I could follow where the program went from the schedule() function.(it went back to schedule_preempt_disabled() and the backed to rest_init(). So this path was the init task which is going to be idling at the end.).

Chan Kim
  • 5,177
  • 12
  • 57
  • 112