0

I'm writing a bubble sort for string sorting in assembly language and I'm using strtok() to tokenize the string. However, after the first call strtok(str," "), I need to pass NULL as a parameter, i.e strtok(NULL," ")

I've tried NULL equ 0 in the .bss segment but this doesn't do anything.

[SECTION .data]

[SECTION .bss]

string resb 64
NULL equ 0

[SECTION .text]

extern fscanf
extern stdin
extern strtok

global main

main:

    push ebp        ; Set up stack frame for debugger
    mov ebp,esp
    push ebx        ; Program must preserve ebp, ebx, esi, & edi
    push esi
    push edi

    push cadena
    push frmt
    push dword [stdin]      ;Read string from stdin
    call fscanf
    add esp,12              ;clean stack

    push delim
    push string             ;this works
    call strtok
    add esp,8               ;clean stack

    ;after this step, the return value in eax points to the first word 

    push string             ;this does not
    push NULL
    call strtok
    add esp,8               ;clean stack

    ;after this step, eax points to 0x0

    pop edi         ; Restore saved registers
    pop esi
    pop ebx
    mov esp,ebp     ; Destroy stack frame before returning
    pop ebp
    ret         ;return control to linux

I've read that in "most implementations" NULL points to 0, whatever that means. Why is there ambiguity? What is the equivalent to NULL in x86 instruction set?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Carlos Carral
  • 23
  • 1
  • 3

3 Answers3

6
 push NULL 
 push string 
 call strtok

this is calling strtok(string, NULL). You want strtok(NULL, " "), so presuming that delim contains " " :

 push delim
 push NULL
 call strtok

Parameters go onto the stack in reverse (right-to-left) order in the cdecl calling convention.


For the other part of your question (is NULL always zero), see : Is NULL always zero in C?

J...
  • 30,968
  • 6
  • 66
  • 143
  • Thank you, i hadn't noticed. However, after making this change and recompiling still no change. Does this have to do with calling strtok from outside a compiled c program? Or could NULL be anything else? Perhaps it has to do that I'm compiling my program for a 32 bit architecture but my C implementation is for 64 bit arch? – Carlos Carral Jun 05 '19 at 18:26
  • 2
    @CarlosCarral Maybe, maybe, and maybe. You haven't told us much about your environment, operating system, etc. A [mcve] and a second question might be the best way forward. – J... Jun 05 '19 at 19:50
  • 1
    @CarlosCarral: In all x86 calling conventions / ABIs, the asm bit-pattern for `NULL` pointers is integer `0`. So `push 0` is always safe *on x86*. The C standard allows it to be different because *some* hardware might want to use something else, e.g. a bit-pattern that always faulted. This is not done on x86. (Despite that fact that some OSes, notably Windows 95, mapped the zero page in the address space of user-space processes, so the undefined behaviour of NULL-pointer dereference could be corrupting the whole machine state, instead of just faulting that process!) – Peter Cordes Jun 06 '19 at 02:38
3

I've read that in "most implementations" NULL points to 0, whatever that means.

No, it is 0; it's not a pointer to anything. So yes, NULL equ 0 is correct, or just push 0.

In C source, (void*)0 is always NULL, but implementations are allowed to internally use a different non-zero bit-pattern for the object-representation of int *p = NULL;. Implementations that choose a non-zero bit-pattern need to translate at compile time. (And the translation only works at compile time for compile-time integer constant expressions with value zero that appear in a pointer context, not for memset or whatever.) The C++ FAQ has a whole section on NULL pointers. (Which also applies to C in this case.)

(It's legal in C to access the bit-pattern of an object with memcpy into an integer, or with (char*) aliasing onto it, so it is possible to detect this in a well-formed program that's free from undefined behaviour. Or of course by looking at the asm or memory contents with a debugger! In practice you can easily check that the right asm for a NULL is by compiling int*foo(){return NULL;} )

See also Why is address zero used for the null pointer? for some more background.

Why is there ambiguity? What is the equivalent to NULL in x86 instruction set?

In all x86 calling conventions / ABIs, the asm bit-pattern for NULL pointers is integer 0.

So push 0 or xor edi,edi (RDI=0) is always what you want on x86 / x86-64. (Modern calling conventions, including all x86-64 conventions, pass args in registers.) Windows x64 passes the first arg in RCX, not RDI.


@J...'s answer shows how to push args in right-to-left order for the calling convention you're using, resulting in the first (left-most) arg at the lowest address.

Really you can store them to the stack however you like (e.g. with mov) as long as they end up in the right place when call runs.


The C standard allows it to be different because C implementations on some hardware might want to use something else, e.g. a special bit-pattern that always faults when dereferenced, regardless of context. Or if 0 was a valid address value in real programs, it's better if p==NULL is always false for valid pointers. Or any other arcane hardware-specific reason.

So yes there could have been some C implementations for x86 where (void*)0 in the C source turns into a non-zero integer in the asm. But in practice there aren't. (And most programmers are happy that memset(array_of_pointers, 0, size) actually sets them to NULL, which relies on the bit-pattern being 0, because some code makes that assumption without thinking about the fact that it's not guaranteed to be portable).

This is not done on x86 in any of the standard C ABIs. (An ABI is a set of implementation choices that compilers all follow so their code can call each other, e.g. agreeing on struct layout, calling conventions, and what p == NULL means.)

I'm not aware of any modern C implementations that use non-zero NULL on other 32 or 64-bit CPUs either; virtual memory makes it easy to avoid address 0.

http://c-faq.com/null/machexamp.html has some historical examples:

The Prime 50 series used segment 07777, offset 0 for the null pointer, at least for PL/I. Later models used segment 0, offset 0 for null pointers in C, necessitating new instructions such as TCNP (Test C Null Pointer), evidently as a sop to [footnote] all the extant poorly-written C code which made incorrect assumptions. Older, word-addressed Prime machines were also notorious for requiring larger byte pointers (char *) than word pointers (int *).

... see the link for more machines, and the footnote from this paragraph.

https://www.quora.com/On-which-actual-architectures-is-Cs-null-pointer-not-a-binary-zero-all-bits-zero reports finding a non-zero NULL on 286 Xenix, I guess using segmented pointers.


Modern x86 OSes make sure processes can't map anything into the lowest page of virtual address space, so NULL pointer dereference always faults noisily to make debugging easier.

e.g. Linux by default reserves the low 64kiB of address space (vm.mmap_min_address). This helps whether it came from a NULL pointer in the source, or whether some other bug zeroed a pointer with integer zeros. 64k instead of just the low 4k page catches indexing a pointer as an array, like p[i] with small to medium i values.

Fun fact: Windows 95 mapped the lowest pages of user-space virtual address space to the first 64kiB of physical memory to work around a 386 B1 stepping erratum. But fortunately it was able to set things up so access from a normal 32-bit process did fault. Still, 16-bit code running in DOS compat mode could trash the whole machine very easily.

See https://devblogs.microsoft.com/oldnewthing/20141003-00/?p=43923 and https://news.ycombinator.com/item?id=13263976

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 2
    Win32 progs had first 64KiB mapped but couldn't address it without a fault. Win16 near pointers were relative to the beginning of a 64kb segment that of course wasn't the first 64KiB of physical memory. Win16 far pointers were selector:offset. Selector 0 causes a fault accessing NULL descriptor Win32 would fault because data selectors were expand down. The real problem was code running in a MS-DOS compatibility session could clobber the first 1MiB of memory. There were of course things a Win16 and Win32 process could do to circumvent the limited protection. – Michael Petch Jun 06 '19 at 05:08
  • 1
    @MichaelPetch: oh, that's a lot less insane than I thought, for "normal" code. And IIRC, it wasn't NULL deref but rather array-overrun bugs that I remember leading to system-wide lockups on Win9x back in the summer of 2000 (MinGW or Cygwin for a command-line test program that just used scanf/printf and some math library functions to test the function I would call from Excel once I had it working.) So zero-page stuff doesn't explain it. – Peter Cordes Jun 06 '19 at 05:45
  • Personally, I was waiting for OP to drop the *"Oh, it's (insert esoteric real-time embedded x86 platform) with a prototype C implementation, nightly build..."*. Nice answer, in any case. – J... Jun 06 '19 at 09:10
  • @J...: heh, if that was the case I hope they'd choose a less bad calling convention than stack args! At least a couple register args are probably a win for code-size for optimized code that normally keeps things in registers anyway. And certainly a win for overall performance. Maybe "Oh, it's a DeathStation 9000 nightly build" for maximal C gotchas including NULL not having an all-zero bit-pattern. :P – Peter Cordes Jun 06 '19 at 09:30
  • @PeterCordes Aye, the latter! Really I was just hoping OP would edit in some of their platform details so we didn't have to guess. – J... Jun 06 '19 at 12:02
  • in nasm it's simply pushing 0, or set a symbol NULL equal to 0, and push NULL. in X64 calling conventions, the first 4 parameters are passed in registers, and the rest on the stack. For instance, calling WriteFile in the win32 API would entail moving the relative address of the handle to write to into RCX, loading the effective address of the data into RDX, moving the length of the data into R8, loading the effective address of variable that stores the out parameter written, and pushing NULL onto the stack (the 5th parameter). – Jonathan May 02 '21 at 15:40
  • @Jonathan: Correct, except that only Windows uses that 64-bit calling convention. First arg in RDI is correct for x86-64 System V, used on every other OS. ([Why does Windows64 use a different calling convention from all other OSes on x86-64?](https://stackoverflow.com/q/4429398)). We can tell from the code in the question that it's not Windows, because it's 32-bit but using `call fscanf` instead of `call _fscanf`. My answer assumed Linux, and the main point was that in asm for sane C implementations, NULL is just `0` – Peter Cordes May 02 '21 at 15:49
  • @Jonathan: But sure, if people are looking for the title question (not the unrelated bug about arg order the question has) and miss the mention of `equ`, I edited to put concrete examples of that near the top of my answer. – Peter Cordes May 02 '21 at 15:55
  • @PeterCordes in assembly, calling convention only matters that the caller and callee both agree. Also, given that you'd placed links referencing windows calling convention, I assumed you were trying to call windows API's. Regardless of your calling conventions, a mov register, NULL will null that register, and push NULL will push null on the stack, given that NULL is already defined as 0, using nasm. Other assemblers vary, particularly those that use the AT&T syntax. One should stick with the manual for their assembler to know the conventions that assembler uses. – Jonathan May 02 '21 at 22:53
  • @Jonathan: Yup, I think my answer clearly shows that now nearer the top, thanks for your earlier comment which made me notice that I didn't fully give an example of the asm syntax. None of the links in my answer mentioned the Windows calling convention, though, so IDK where that miscommunication / misinterpretation happened. Not important now. – Peter Cordes May 03 '21 at 02:51
2

You are actually asking two questions:

Question 1

I've read that ... NULL points to 0, whatever that means.

This means that nearly all C compilers define NULL as (void *)0.

This means that a NULL pointer is a pointer to the memory location with the address zero.

I've read that in "most implementations" ...

"Most" mean that before the introduction of ISO C and ANSI C in the late 1980s, there were C compilers that defined NULL in a different way.

Maybe a few non-standard C compilers still exist that do not recognize the address 0 as NULL.

However, you can assume that your C compiler and the C library you use in your assembly project defines NULL as pointer to the address 0.

Question 2

How do I push the equivalent of NULL in C to the stack in assembly?

A pointer is an address.

(Unlike some other CPUs), x86 CPUs don't distinguish between integers and addresses:

You push a NULL pointer by pushing the integer value 0.

NULL equ 0

push NULL

Unfortunately, you did not write which assembler you use. (Other users assume it is NASM.)

In this context, the instruction push NULL may be interpreted in two different ways by different assemblers:

  • Some assemblers would interpret this as: "Push the value 0".

    This would be correct.

  • Other assemblers would interpret this as: "Read the memory at memory location 0 and push that value"

    This would be equal to someFunction(*(int *)NULL) in C and therefore cause an exception (NULL pointer access).

Martin Rosenau
  • 17,897
  • 3
  • 19
  • 38
  • This is NASM syntax, `NULL equ 0` plus `push NULL` boils down to `push 0`. In AT&T syntax, that would be `push $0`. This answer isn't helpful. `NULL equ 0` correctly defines `NULL` as an assemble-time constant with *value* 0, just like `.equ NULL, 0` would in GAS. That's how you use assemble-time named constants. – Peter Cordes Jun 06 '19 at 08:12
  • In literally all ISO C compilers, `(void *)0` is a NULL pointer constant. The `NULL` macro doesn't *have* to be defined that way, but an integer `0` in a pointer context is required to compile to whatever bit-pattern is used to represent a NULL pointer, whether or not that's all-zero. On a machine that used all-ones as its NULL, `(void *)0xFFFFFFFF` might *also* be a valid definition of NULL, but your phrasing is confusing and implies that `(void*)0` wouldn't be NULL on such a machine. – Peter Cordes Jun 06 '19 at 08:18
  • @PeterCordes There are a lot of different assemblers which have a very similar syntax but interpret some instructions differently, so one assembler interprets `push xyz` as `push [xyz]` (NASM?) and the other one interprets the same source line as `push offset xyz` (as early Microsoft assemblers did). I just checked GAS (version 2.24), AT&T style: `.equ NULL, 0` followed by `push NULL` will generate the same code as `push 0` which is `push [ds:0]` in Intel syntax; to push the value 0, you'll have to write `push $NULL`, not `push NULL`. (I'm not sure if this changed in newer GAS versions). – Martin Rosenau Jun 06 '19 at 09:30
  • 1
    @PeterCordes I was talking about C compilers from the 1970s. ANSI C and ISO C started to exist in 1989. I doubt that the C compilers that defined `NULL` as `*(void *)0xFFFFFFFF` are ISO or ANSI compliant. And as far as I remember there was even a compiler mentioned that used a "special" pattern like `#define NULL (void *)0xEF000000`. This makes only sense if `(void *)0` and `(void *)0xFFFFFFFF` are **valid**, non-NULL pointers on the machine and OS. – Martin Rosenau Jun 06 '19 at 09:36
  • Yes, I'm well aware of how GAS's MASM-like `.intel_syntax` works, and AT&T syntax. But this question is fairly unambiguously NASM (from the `push dword [stdin]` not `dword ptr`), so I meant that your answer is confusing at best to someone learning NASM. Yes in GAS `.intel_syntax` a `push NULL` would be a memory operand with absolute address `0`, and you would need `push OFFSET NULL` to get the constant value (aka symbol address) rather than implicitly dereferencing it to get the pointed-to value. – Peter Cordes Jun 06 '19 at 09:40
  • And BTW "defines a variable" implies reserving space, as well as labeling an address. It definitely does not do that. – Peter Cordes Jun 06 '19 at 09:43
  • 1
    @PeterCordes I edited my answer completely and removed a lot of confusing text. – Martin Rosenau Jun 06 '19 at 09:50
  • *"Most" mean that before the introduction of ISO C and ANSI C in the late 1980s, there were C compilers that defined NULL in a different way.* No, that's not what it means. Modern ISO C doesn't specify the object-representation of `(void*)0`. The examples in http://c-faq.com/null/machexamp.html could all be valid with modern C11 compilers. `int *p = 0;` and then `memcpy` or otherwise using a `char*` to inspect the object representation lets you see if the bit-pattern is non-zero. See also [Is NULL always zero in C?](//stackoverflow.com/q/9894013) – Peter Cordes Jun 06 '19 at 10:03
  • What you say about historical implementations is presumably *true*, but I don't think that's what the OP was reading about. – Peter Cordes Jun 06 '19 at 10:03