0

This program must search for all occurrences of string 2 in string 1.
It works fine with all the strings i have tried except with
s1="Ciao Cia Cio Ociao ciao Ocio CiCiao CieCiaCiu CiAo eeCCia"
s2="Cia"
in this case the correct result would be: 0 5 31 39 54
instead, it prints 0 5 39.
I don't understand why, the operation seems the same as
s1="Sette scettici sceicchi sciocchi con la sciatica a Shanghai"
s2="icchi"
with which the program works correctly.
I can't find the error!
The code:

#include <stdio.h>

void main()
{
    #define MAX_LEN 100

        // Input
    char s1[] = "Ciao Cia Cio Ociao ciao Ocio CiCiao CieCiaCiu CiAo eeCCia";
    unsigned int lengthS1 = sizeof(s1) - 1;
    char s2[] = "Cia";
    unsigned int lengthS2 = sizeof(s2) - 1;
    // Output
    unsigned int positions[MAX_LEN];
    unsigned int positionsLen;

    // Blocco assembler
    __asm
    {
        MOV ECX, 0
        MOV EAX, 0
        DEC lenghtS1
        DEC lengthS2
        MOV EBX, lengthS1
        CMP EBX, 0
        JZ fine
        MOV positionsLen, 0
        XOR EBX, EBX
        XOR EDX, EDX




    uno: CMP ECX, lengthS1
    JG fine
    CMP EAX, lengthS2
    JNG restart
    XOR EAX, EAX


    restart : MOV BH, s1[ECX]
    CMP BH, s2[EAX]
    JE due
    JNE tre


    due : XOR EBX, EBX
    CMP EAX, 0
    JNE duedue
    MOV positions[EDX * 4], ECX
    INC ECX
    INC EAX
    JMP uno


    duedue : CMP EAX, lengthS2
    JNE duetre
    INC ECX
    INC EDX
    INC positionsLen
    XOR EAX, EAX
    JMP uno


    duetre : INC EAX
    INC ECX
    JMP uno


    tre : XOR EBX, EBX
    XOR EAX, EAX
    INC ECX
    JMP uno




fine:
    }

    // Stampa su video
    {
        unsigned int i;
        for (i = 0; i < positionsLen; i++)
            printf("Sottostringa in posizione=%d\n", positions[i]);
    }
}

please,help.

Ryan Zhang
  • 1,856
  • 9
  • 19
Dan5
  • 1
  • Have you tried running your code line-by-line in a debugger while monitoring the values of all variables (and CPU registers), in order to determine in which line your program stops behaving as intended? If you did not try this, then you may want to read this: [What is a debugger and how can it help me diagnose problems?](https://stackoverflow.com/q/25385173/12149471) You may also want to read this: [How to debug small programs?](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/) – Andreas Wenzel Aug 04 '22 at 20:58
  • Okay - you want to learn assembly language. So you did not write a C-version of the algorithm first. If you had, you could convert the C-version step by step to assembly and thus, you would see, where your error sneaks in. Extra benefit: Your former c-version could become comments in your assembly program. – BitTickler Aug 04 '22 at 21:15
  • Is that Microsoft's c? If so, running it thru the VS debugger might get a little [ugly](https://developercommunity.visualstudio.com/t/inline-asm-writes-incorrect-line-number/10074759). Anything with labels has a problem where the code doesn't line up correctly. – David Wohlferd Aug 05 '22 at 01:40

1 Answers1

2

The trickier programming gets, the more systematic and thoughtful your approach should be. If you programmed x86 assembly for a decade, you will be able to skip a few of the steps I line out below. But especially if you are a beginner, you are well advised to not expect from yourself, that you can just hack in assembly with confidence and without safety nets.

The code below is just a best guess (I did not compile or run or debug the C-code). It is there, to give the idea.

  • Make a plan for your implementation
    So you will have 2 nested loops, comparing the characters and then collecting matches.
  • Implement the "assembly" in low level C, which already resembles the end product.
    C is nearly an assembly language itself...
  • Write yourself tests, debug and analyze your "pseudo assembly" C-version.
  • Translate the C lines step by step by assembly lines, "promoting" the c-lines to comments.

This is my first shot at doing that - the initial c-version, which might or might not work. But it is still faster and easier to write (with the assembly code in mind). And easier to debug and step through. Once this works, it is time to "translate".

#include <stdint.h>
#include <stddef.h>
#include <string.h>

size_t substring_positions(const char *s, const char* sub_string, size_t* positions, size_t positions_capacity) {
  size_t positions_index = 0;
  size_t i = 0;
  size_t j = 0;
  size_t i_max = strlen(s) - strlen(sub_string);
  size_t j_max = strlen(sub_string) - 1;

 loop0:
  if (i > i_max)
    goto end;
  j = 0;
 loop1:
  if (j == j_max)
    goto match;
  if (s[i+j] == sub_string[j])
    goto go_on;
  i++;
  goto loop0;
 go_on:
  j++;
  goto loop1;
 match:
  positions[positions_index] = i;
  positions_index++;
  if (positions_index < positions_capacity)
    goto loop0;
  goto end;
    
 end:
  return positions_index;
}

As you can see, I did not use "higher level language features" for this function (does C even have such things?! :)). And now, you can start to "assemble". If RAX is supposed to hold your i variable, you could replace size_t i = 0; with XOR RAX,RAX. And so on.

With that approach, other people even have a chance to read the assembly code and with the comments (the former c-code), you state the intent of your instructions.

BitTickler
  • 10,905
  • 5
  • 32
  • 53
  • C has `do{}while()` and `if()break;` and which are useful for expressing typical assembly idioms: a standard loop with the conditional branch at the bottom, and breaking out of a loop early. They translate directly to `cmp eax,ecx`/`jcc top_of_outer_loop` or whatever, or `dec ecx`/`jnz`. And of course if()break translates directly to `cmp`/`jcc` to a later label. It's often simplest to think in terms of structure programming at first, before optimizing, and using C structured programming (non-goto) constructs is a nice way to do that. – Peter Cordes Aug 06 '22 at 00:35
  • The upside of using gotos manually is that you can lay out your branches manually in the C, e.g. instead of an if(){stuff} inside the loop, you have the match-found special case outside the loop, rather than as something that has to be jumped over every iteration like the standard translation to asm for an `if()` other than an `if()break`. – Peter Cordes Aug 06 '22 at 00:38
  • Of course, there are many ways to skin a cat. I opted for this style, because I can already see all future assembly labels and the flow control in verbatim. – BitTickler Aug 06 '22 at 06:08
  • For beginners to asm, using more familiar structured-programming concepts to write a working program is usually a good approach. Then massage the C logic to be more asm-like / easier to translate to asm, and yeah you can do a step of C with only `goto` as part of that translation instead of going straight from do{}while (or inefficient while(){}) to asm. Your answer is fine, although I might have shown a structured version of it, too. (If you or I had extra time to write both). Also I wanted to suggest that future readers might want to start with more normal / easier to write C. – Peter Cordes Aug 06 '22 at 06:16