How to split / truncate a string variable value in Assembly?

Question

I am currently working on a project and for storage's sake I would like to cut off a variable in assembly, and (optionally) make that the value of a register, such as eax.

I will need code that works with NASM using Intel syntax.

For example, if the variable "msg" is set to "29ak49", I want to take a part of that, like "9ak4", and put it in a register, or something similar.

Edited by adding a space and commented, since i don't see my entry on the main page, even on the recent page. — LavaMaster107, Sep 18 '16 at 12:18
Was just browsing and found this. I guess you have some good luck. — Star OS, Sep 18 '16 at 12:46
@sdsmith I deleted and undeleted right when i saw your comment. Anyway, i mean like a part of a variable. Like, if a variable contained "aj92", i would want to get the "j92" part and store it in a register, like eax. — LavaMaster107, Sep 18 '16 at 13:16
Why is this tagged `[environment-variables]`? Access to env vars depends on the ABI of the OS your code will run on. It sounds more like you're asking about static data that you define with `section .rodata` `msg db "29ak49"`. — Peter Cordes, Sep 18 '16 at 17:31
What do you mean by "storage's sake"? Just by loading the part of memory content into register you are not changing anything, up until some other code does use that part of memory for it's own data. But juggling around all that other code with additional 4 bytes stored in register sounds like quite useless idea, so I wonder what you want actually to achieve. — Ped7g, Sep 19 '16 at 10:01
@Ped7g not writing 200 KB of stuff when i could literally do something that is like 1 KB which produces the same — LavaMaster107, Sep 19 '16 at 18:07
Yeah, so I'm missing that leap from 4B save to 199kiB save. Either you have some hugely repetitive pattern in data (and employing some compression like LZMA would probably work "good enough", but being more versatile), or I don't see how storing part of value in register may help you. — Ped7g, Sep 20 '16 at 09:02
@Ped7g I have some code to write with VGA, and i want to save space by instead of writing every single instruction, only writing a few things, and do it like a machine. — LavaMaster107, Sep 20 '16 at 09:47
@d-cubed - thanks for your work in copy-editing questions. Small suggestion: take a look at the tags, too. This one is missing any tags about what kind of assembly language it's for. — Peter Cordes, Jun 18 '20 at 00:20

score 2 · Accepted Answer · edited May 23 '17 at 10:34

2

As Peter Cordes mentioned in the comments, you can always add a null terminator (0) into the existing string's buffer to right truncate; if you don't mind modifying the original string data.

The example below will retrieve a substring without modifying the original string.

If you have the address of a variable, and you know where you want to truncate it, you can take the address of the starting position of the data, and add an offset to left truncate. To right truncate you can just read as many characters as you need from the new offset.

For example in x86:

    msg db '29ak49'         ; create a string (1 byte per char)


    ;; left truncate
    mov esi, msg            ; get the address of the start of the string
    add esi, OFFSET_INTO_DATA ; offset into the string (1 byte per char)

    ;; right truncate
    mov edi, NUM_CHARS      ; number of characters to take

    .loop:                  
    movzx eax, byte [esi]   ; get the value of the next character
    ;; do something with the character in eax   
    inc esi
    dec edi
    jnz .loop

    ;; end loop

EDIT:

The following is a runable test implementation as a 32-bit Linux application that prints out the substring selected based on OFFSET_INTO_DATA and NUM_CHARS (note: the algorithm is the same, but the registers have changed):

        section .text
        global _start


_start:

        ;; left truncate
        mov esi, msg            ; get the address of the start of the string
        add esi, OFFSET_INTO_DATA ; offset into the string (1 byte per char)

        ;; right truncate
        mov edi, NUM_CHARS      ; number of characters to take

        .loop:                  
        mov ecx, esi            ; get the address of the next character
        call print_char_32        
        inc esi
        dec edi
        jnz .loop

        jmp halt


;;; input:      ecx      -> character to display
print_char_32:
        mov edx, 1              ; PRINT
        mov ebx, 1              ;
        mov eax, 4              ;
        int 0x80                ;
        ret


halt:
        mov eax, 1              ; EXIT
        int 0x80                ;
        jmp halt


section .data

       msg db '29ak49'         ; create a string (1 byte per char)


       OFFSET_INTO_DATA EQU 1
       NUM_CHARS EQU 3

Compiled with:

nasm -f elf substring.asm ; ld -m elf_i386 -s -o substring substring.o

edited May 23 '17 at 10:34

Community

1
1

answered Sep 18 '16 at 13:50

Stewart Smith

1,396
13
28

Thanks! I haven't tested it, and i will need to modify it a bit. – LavaMaster107 Sep 18 '16 at 14:07
No problem. It is intended as a proof of concept. – Stewart Smith Sep 18 '16 at 14:14
I need multiple characters, which needs a redesign. (of the code). – LavaMaster107 Sep 18 '16 at 14:48
@LavaMaster107 What do you mean by multiple characters? This implementation will give you `NUM_CHARS` characters, one per iteration in the `eax` register. You can manipulate the character after the line that says `;; do something with the single character here`. – Stewart Smith Sep 18 '16 at 15:32
By my understanding, that will pick up 1 and skip 1, if the value was 2, instead of 1, not that. I also need to left truncate. – LavaMaster107 Sep 18 '16 at 16:54
I don't understand your concern. Can you elaborate? – Stewart Smith Sep 18 '16 at 23:05
If you change that value, it will skip others and only inspect a few. – LavaMaster107 Sep 19 '16 at 18:07
I have implemented it as a 32bit linux application, and it works fine for me. I will post the full implementation so you can test. – Stewart Smith Sep 20 '16 at 05:04
1

Minor improvement: Instead of `jz .end / jmp .loop`, just write `jnz .loop` so you exit with a fall-through when the branch is not-taken. That's more efficient, and the usual idiom for looping. – Peter Cordes Sep 20 '16 at 05:32
1

Also, `dec ecx` already sets ZF according to the result. So `dec ecx / jnz` is idiomatic for looping ([see this answer](http://stackoverflow.com/a/39554309/224132)). `mov eax, [ebx]` loads 4 bytes. Maybe you mean `movzx eax, byte [ebx]`. You can also use `inc ebx` instead of `add ebx, 1`. – Peter Cordes Sep 20 '16 at 05:35
1

Also worth mentioning that you can right-truncate by storing a zero byte to mark the new end of the string, assuming it's an implicit-length C-style string that you're allowed to modify. – Peter Cordes Sep 20 '16 at 05:38
In your full example, your print function clobbers ebx, which you're using as a loop counter. Why not just call `write()` once, with start and length determined by the new length you want to truncate to? That would be a good illustration of how easy it is to do with explicit-length strings, when you can assume that the new length doesn't go off the end of the string. – Peter Cordes Sep 20 '16 at 05:42
1

Oh nvm, you used pusha/popa. That's horrible, at least for efficiency, although calling write() one character at a time dwarfs even that. I guess it's an ok example of looping over part of a string, but a bad example for other things. – Peter Cordes Sep 20 '16 at 05:53
1

BTW, the Linux `int 0x80` system call ABI preserves all registers except EAX, so you could set up your loop to use ESI and ESI for counters, or something. Or increment a pointer in ECX up to a limit in another register. Then you can have a syscall in the loop without a lot of register shuffling. – Peter Cordes Sep 20 '16 at 06:05
@PeterCordes Thank you for advice, I will edit my post to accommodate them. In the implementation example, I was attempting to keep the algorithm the same, so there is unnecessary register use. The goal of the implementation was to show the solution did as it intended, as so is not meant to be 'useful'. – Stewart Smith Sep 20 '16 at 17:41
1

Fair enough, it's easy to end up with weird code when you're just trying to illustrate something :P I'm glad at least some of my code-review comments were helpful. – Peter Cordes Sep 20 '16 at 18:16

How to split / truncate a string variable value in Assembly?

1 Answers1