2

I am trying to find the most efficient way to find tags in a given char array. These "tags" are a sequence of chars located randomly within a char array.

Here is an example: given a char array: {'a','s','s','1','m','s','g','e','x','x','r','s','1',...}. the tag "ss1" indicates the beginning of a message which contains every char until a sequence of "exx" is found, which a tag for the end of the message, and it keeps searching the array for the next sequence of "s1". In this example, the message here is "msg".

my initial design was (pseudo code)

while(array[i] != '\0')
    if(array[i] == 's' && array[i+1] == 's' && array[i+2] == '1'  )
        int j = i+3;
            if(array[j] != '\0' && array[j] == 'e' && array[j+1] == 'x' && array[j+2] == 'x' )
                i += 3;
            else
                print(array[j]);
    else i++; //next char

may be a little flawed, but you get the idea. Is there a better way? i thought about strstr but since I'm dealing with a char array here and still looping even after deciphering a message, I thought it might be difficult to implement.

i_use_the_internet
  • 604
  • 1
  • 9
  • 28
  • can tags be nested ? something like this is acceptable ? msg_start_.....innerloopmessage...msg_eng_.... – Ravi Sankar Raju May 28 '16 at 23:54
  • no @RaviSankarRaju – i_use_the_internet May 28 '16 at 23:54
  • Well for starters, you're indexing past the end of the array. What if there's a single character left? So `array[i]` is `'x'` for instance, and `array[i+1]` is `'\0'`. So you check to see if `array[i]` is `'\0'`, which it isn't. You then proceed to examine *three* entries, when you only know that one is available. That is probably an access violation. It will *probably* work, but it is not guaranteed, and it is very sloppy coding. Never, ever try to read past the end of an array. Even worse, you then go even further with `j`. – Tom Karzes May 28 '16 at 23:55
  • thanks for the catch @TomKarzes I intend to safeguard against those kind of access, I just wanted to show what i was thinking of designing and am looking for a better solution – i_use_the_internet May 28 '16 at 23:57
  • did you try indexOf? http://stackoverflow.com/questions/4824/string-indexof-function-in-c – Proxytype May 28 '16 at 23:58
  • Here's a much better way to do this: Calculate the length before starting your search, and then after that just use the length without having to check for `'\0'`. That will be more efficient as well. For instance, you can immediately check if `i + 2` is less than `length`, without having to scan ahead for `'\0'`. – Tom Karzes May 28 '16 at 23:59
  • @Proxytype strcspn is more like String.IndexOfAny() and does not mimic indexOf() like java does, wouldn't that be risky? – i_use_the_internet May 29 '16 at 00:02
  • you can use indexOf, strstr, strchr you named it, – Proxytype May 29 '16 at 00:05
  • @Proxytype so how will that work for the "ss%d" tag? where an int is appended to the ss indicating message number? – i_use_the_internet May 29 '16 at 00:10
  • Better to use `strncmp` instead of `array[i+1] == 's' && array[i+2] == '1'` – lost_in_the_source May 29 '16 at 00:12
  • indexof will tell you the first char that been found check if the second char is also "s" and next char is number... – Proxytype May 29 '16 at 00:13
  • @Proxytype better yet I can do an indexof "ss" and check if the next is a number. I can do that right? – i_use_the_internet May 29 '16 at 00:17
  • yeap, you can do it... – Proxytype May 29 '16 at 00:18
  • It looks like you need `strstr()` instead. If you are 100% positive there will not be embedded `'\0'`s in the array then you could add one at the end and use `strstr()` to parse the "*string*". – Iharob Al Asimi May 29 '16 at 01:50
  • Can the message inside the tag may have more than three characters ? – Nishant May 29 '16 at 07:25
  • 2
    `Is there a better way? ` Construct a state machine. – wildplasser May 29 '16 at 10:28
  • @Nishant yes it can – i_use_the_internet May 30 '16 at 06:08
  • @i_use_the_internet Take a look at my answer. Ask me if you need any clarifications. – Nishant May 30 '16 at 08:07

1 Answers1

1

Try to maintain a state denoting how much of the tag start and end you have found. Something like this: (This code will work even if the message within the tag is of arbitrary length)

int state = 0;
int found = 0;
int i = 0,j;
int msgStartIndex;
int msgEndIndex;
while(array[i]){
    if((array[i] == 's' && state == 0) || (array[i] == 's' && state == 1) || (array[i] == '1' && state == 2) ){
        state++;
        if(!found && state == 3){
            msgStartIndex = i+1;
            found = 1;
        }
    }
    else if(!found && (array[i] = 's' && state == 2))
        state = 2;
    else if(!found)
        state = 0;
    if((array[i] == 'e' && state == 3) || (array[i] == 'x' && state == 2) || (array[i] == 'x' && state == 1) ){
        state--;
        if(found && state == 0){
            found = 0;
            msgEndIndex = i-3;
            for(j=msgStartIndex; j < msgEndIndex+1; j++)
                printf("%c",array[j]);
            printf("\n");
        }
    }
    else if(found && (array[i] == 'e') && (state == 2 || state == 1))
        state = 2;
    else if(found)
        state = 3;
    i++;
}

Updated answer for start tag st1 and end tag ex1

int state = 0;
int found = 0;
int i=0,j;
int msgStartIndex;
int msgEndIndex;
while(array[i]){
    if((array[i] == 's' && state == 0) || (array[i] == 't' && state == 1) || (array[i] == '1' && state == 2) ){
        state++;
        if(!found && state == 3){
            msgStartIndex = i+1;
            found = 1;
        }
    }
    else if(!found && (array[i] = 's' && (state == 1 || state == 2)))
        state = 1;
    else if(!found)
        state = 0;
    if((array[i] == 'e' && state == 3) || (array[i] == 'x' && state == 2) || (array[i] == '1' && state == 1) ){
        state--;
        if(found && state == 0){
            found = 0;
            msgEndIndex = i-3;
            for(j=msgStartIndex; j < msgEndIndex+1; j++)
                printf("%c",array[j]);
            printf("\n");
        }
    }
    else if(found && (array[i] == 'e') && (state == 2 || state == 1))
        state = 2;
    else if(found)
        state = 3;
    i++;
Nishant
  • 2,571
  • 1
  • 17
  • 29
  • what happens when the closing tag is ex1? how does that affect the code? – i_use_the_internet Jun 03 '16 at 12:02
  • @i_use_the_internet just change `if((array[i] == 'e' && state == 3) || (array[i] == 'x' && state == 2) || (array[i] == 'x' && state == 1) )` to `if((array[i] == 'e' && state == 3) || (array[i] == 'x' && state == 2) || (array[i] == '1' && state == 1) )` – Nishant Jun 03 '16 at 12:28
  • it causes a segmentation fault for array {'a','s','t','1','m','s','g','e','x','1','r','s','t','1','s','t','1','s','s','g','e','x','1','z'}; starting tag st1 ending tag ex1 – i_use_the_internet Jun 03 '16 at 14:49
  • I am out on the weekend. Will get back to you on maonday IST – Nishant Jun 04 '16 at 07:55
  • @i_use_the_internet Sorry, I forgot to initialize `i`, hence the segmentation fault. Try it now. Also it seems like in the example the start tag is `st1` instead of `ss1`. You will have to change the logic if you want start tag as `st1` – Nishant Jun 07 '16 at 11:08