5

I'm trying to read Linux source code(2.6.11)

In the exception handler, at entry.s, error_code:

movl $(__USER_DS), %ecx
movl %ecx, %ds
movl %ecx, %es

I don't know why loading user data segment here. Since it is supposed to be entering the exception handler code which runs in the kernel mode, the selector is supposed to be __KERNEL_DS.

I checked other versions of the code, they do the same thing specifically too at this place.

Holmes
  • 63
  • 4

2 Answers2

3

If the exception handler is entered with ds and es already set to the data segment, it makes no difference except for maybe a microsecond of delay. Exception handlers don't usually need to be fast.

But what might cause going to the exception handler? Could it have been because a bad value was loaded into a segment register and then referenced? In such cases it is important for the code to establish a safe environment. cs is set by the exception invocation. To be bulletproof, ss and esp should be set up too.


Followup:

Looking at the 2-6.22.18 kernel for i386, I don't see exactly that:

error_code:   /* the function address is in %fs's slot on the stack */
     pushl  %es
     ...  pushes %ds, %eax, %ebp, %edi, %esi, %edx, %ecx, %ebx, %fs
     ...  along with pseudo-ops to manage stack frame layout
     movl  $(__KERNEL_PERCPU), %ecx
     movl  %ecx, %fs
     popl  %ecx   // retrieves saved %fs
     ... sets up registers for the exception function

The symbol __KERNEL_PERCPU is a macro defined (in include/asm-i386/segment.h) as 0 for non-SMP machines and (GDT_ENTRY_PERCPU * 8) for SMPs. The 8 is for the GDT entry size (I think) and the GDT_ENTRY_PERCPU relates to the entries in the per-CPU GDT. Its value is <base> + 15 which the comments indicate is "default user DS", so it is, in fact, the same thing.

The kernel data segment is accessed through fs and ss. Much kernel data access is on the stack. By keeping the user mode descriptors accessed through ds, very little loading of segment registers is needed.

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • sorry, my question is too vague. What I mean is why it loads the __USER_DS rather than __KERNEL_DS. I have modified the original question. – Holmes Aug 20 '13 at 07:55
  • @Holmes: I have amended my answer. Does that explain it better? – wallyk Aug 20 '13 at 20:48
  • @walllyk: Guess I make clear of what does "...very little loading of segment register" means finally. Thx again! – Holmes Aug 23 '13 at 11:24
  • Can you please explain "The kernel data segment is accessed through fs and ss". Doe it means kernel exception can not use DS? – dimba Sep 22 '13 at 08:34
  • @dimba: The exception *can* use `DS`: the entry code sets it up. However, DS is mapped to user space, presumably for inspecting/altering user mode stack and variables. – wallyk Sep 22 '13 at 15:11
  • Thanks. I still don't fully get it. AFAIK exception is a legal C code that runs in kernel-space. Compiler can generate assembly that uses DS, so exception can access kernel data. But we set DS to __USER_DS instead of __KERNEL_DS. I'm confused :) – dimba Sep 23 '13 at 08:47
1

In the entry.s:

#define RESTORE_ALL
    RESTORE_REGS
    addl $4, %esp;
1:  iret;
.section .fixup,"ax";
2:  sti;
    movl $(__USER_DS), %edx;
    movl %edx, %ds;
    movl %edx, %es;
    movl $11,%eax;
    call do_exit;
.previous;
.section __ex_table,"a";
    .align 4;
    .long 1b,2b;
.previous

This macro will be called at the end of exception/interrupt/syscalls. The fix code set ds&es to USER_DS, which shows that iret itself will raise an exception once the ds&es's DPL is not 3(user privilege).

So linux set ds&es to USER_DS at the very beginning of exception/interrupt/syscalls to avoid this exception.

Holmes
  • 63
  • 4