0

serious problem because i'm finishing my Thesis for Diploma and i can't figure out this last on problem to finish with my project...

I create 2 images that I'm explain my problem...

If anybody can really help I'll appreciate it. Thanks...

enter image description here

enter image description here

Nakos Kotsanis
  • 180
  • 1
  • 15
  • I can't say that the two images help to make your problem clear. I reckon there's a very elementary problem buried in all these descriptions. – M Oehm Sep 25 '17 at 18:25
  • if you can help me with some idea... ask me if you dont understand something.... Thanks anyway for your consideration – Nakos Kotsanis Sep 25 '17 at 18:28
  • I haven't read through the entire images, but to me this looks like the [longest common substring-problem](https://stackoverflow.com/questions/10248728/how-to-find-longest-common-substring-using-c) or at least a related problem –  Sep 25 '17 at 18:53
  • its like longest common substring problem but i have to set the length of same elements (4 for example) and now i dont have spesific words but huge arrays with 0 & 1 only – Nakos Kotsanis Sep 25 '17 at 19:03
  • For example longest common substring problem find wich chars appear at the same time in both arrays.... i want to see if 4 elements of array 2 in the row (0 or 1) are somewhere in array 1 and save that location of the match – Nakos Kotsanis Sep 25 '17 at 19:26
  • Do you need to find only one possible substring? If your arrays can have only two different values, there will only be 16 possible substrings of length 4, so for long arrays there will be many matches. – M Oehm Sep 25 '17 at 19:28
  • yes of course i can find a lot of matches...Actually the arrays that i will check will have length around 15000(array1) and 5000 length(array2) and the value of same elements in a row that i am searching to match will be around of 75~150 elements...but if i find out a solution for a smaller example i can easily make it work for these huge lengths that i have in my project... – Nakos Kotsanis Sep 26 '17 at 00:09
  • Now you've used four lines to not actually answer my question: Are you interested in all matches or just in one? Or do you just want to know which of the 16 possible substrings are common to both strings? – M Oehm Sep 26 '17 at 05:44
  • (You really should try to communicate clearer. I suggest you scrap that pretty, but confusing image and re-state you problem at heart in the question: I want to find all substrings that are common to two longer strings that are made up of 0s and 1s only. The arrays size is about 15,000.) – M Oehm Sep 26 '17 at 05:46

1 Answers1

0

In your case, you can make use of the fact that your arrays can only contain two possible values and that the length of the substrings is short.

You are looking for substrings of length 4 that are made up of 0s and 1s only. There are 16 such substrings:

        0000    0001    0010    0011
        0100    0101    0110    0111
        1000    1001    1010    1011
        1100    1101    1110    1111

These are the binary numbers from 0 to 15. Create an array of length 16 that stores the position of the first occurrence of the corresponding pattern in the first array. Then create a sliding window that visits every 4-element substring and that determines the pattern id:

        0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0      pos[3] == 0
        <--- 3 -->

        0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0      pos[6] == 1
           <--- 6 -->

        0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0      pos[12] == 2
              <-- 12 -->

        0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0      pos[9] == 3
                 <--- 9 -->

Now do the same for the second array. If the position of the same pattern in the first array is set, you have a match.

The code below does that and shows the first match. (First in respect to the pattern position in the second array.) The pos array stores the end of the substring, though.

#include <stdlib.h>
#include <stdio.h>

enum {M = 12, N = 12, K = 4};

int find(const int *a, int m, const int *b, int n, int k, int *pa, int *pb)
{
    int k2 = 1 << k;
    int kmask = k2 - 1;

    int apos[k2];
    int bpos[k2];

    unsigned int apat = 0u;
    unsigned int bpat = 0u;

    int i;

    *pa = *pb = -1;

    if (m < k || n < k) return 0;

    for (i = 0; i < k2; i++) apos[i] = -1;
    for (i = 0; i < k2; i++) bpos[i] = -1;

    for (i = 0; i < k; i++) {
        apat = (apat << 1) | a[i];
        bpat = (bpat << 1) | b[i];
    }

    for (i = k; i < m; i++) {
        if (apos[apat] == -1) apos[apat] = i;
        apat = ((apat << 1) & kmask) | a[i];
    }

    for (i = k; i < n; i++) {
        if (apos[bpat] != -1) {
            *pa = apos[bpat] - k;
            *pb = i - k;

            return 1;
        }

        bpat = ((bpat << 1) & kmask) | b[i];
    }

    return 0;
}

int main(void)
{
    int a[M] = {0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0};
    int b[N] = {1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0};

    int ia, ib;

    if (find(a, M, b, N, K, &ia, &ib)) {
        printf("[a: %d, b: %d]\n", ia, ib);
    } else {
        puts("No match.");
    }

    return 0;
}

This base code can be modified according to what you are looking for:

  • If you just want a single match, you could traverse the two arrays up the the length of the shorter one simultanoeously, so that you don't have to traverse the whole array if you have an early match.

  • If you want to know which patterns are common, traverse the arrays and keep track of the respective positions, apos and bpos. If a position is present in both position arrays, the corresponding pattern is common to both arrays.

  • If you need to find all matches, you must store all positions corresponding to a pattern. For example, the pattern 1000 or 8 occurs at positions 0 and 4 in the second array. You must store both positions. Then all combinations of postions of a certain pattern are matches.

The code below prints all possible matches. It implements a kind of linked list where head keps the first occurrence of each pattern and next the next occurrence of the same pattern at a certain position.

#include <stdlib.h>
#include <stdio.h>

enum {M = 12, N = 12, K = 4};

void list(const int *a, int m, int k, int *head, int *next)
{
    int k2 = 1 << k;
    int i;

    unsigned int pat = 0u;

    for (i = 0; i < k2; i++) head[i] = -1;

    for (i = m - k + 1; i < m; i++) pat = (pat << 1) | a[i];
    pat <<= 1;

    i = m - k + 1;
    while (i--) {
        pat = (pat | (a[i] << k)) >> 1;

        next[i] = head[pat];
        head[pat] = i;        
    }
}

int main(void)
{
    int a[M] = {0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0};
    int b[N] = {1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0};

    int ahead[1 << K];
    int bhead[1 << K];    

    int anext[M - K + 1];    
    int bnext[N - K + 1];
    int i;

    list(a, M, K, ahead, anext);
    list(b, N, K, bhead, bnext);

    for (i = 0; i < (1 << K); i++) {
        int pa = ahead[i];

        while (pa != -1) {
            int pb = bhead[i];

            while (pb != -1) {
                printf("a: %d, b: %d\n", pa, pb);
                pb = bnext[pb];
            }

            pa = anext[pa];
        }
    }

    return 0;
}   
M Oehm
  • 28,726
  • 3
  • 31
  • 42
  • Is it possible to change the line "enum {M = 12, N = 12, K = 4};" I will read an array every time and the size isn't any time the same... So can i change it with some way like this: enum {M = length1, N = length2, K = 4}; Thanks – Nakos Kotsanis Sep 26 '17 at 16:59
  • The enum constants are just for defining the arrays in the example calling code in `main`. The actual functions don't use them. It should be easy to call the functions with arrays or variable length. – M Oehm Sep 27 '17 at 07:21