I have two different files each of whose content is coming from different streams of data. I have some data collected from these streams in two different files. Then i want to search the files to find any sort of patterns, So that at a later stage if i collect some more data from the streams i should be able to distinguish which data belongs to which stream (based on the patterns that i have found earlier).
An example of the data contained in the file can be : b0 82 91 a2 c3 89 b0 82 4a e3....(more bytes)... Though i have taken very few bytes here, but we can find the pattern "b0 82" coming twice above. So the output should show the pattern and the no of times it is coming. Similarly we can have 3 byte pattern or even more byte pattern.
Still other example can be : aa 00 a7 2f 7b 4c ....(more bytes).....aa 01 a7.........(more bytes)......aa 05 a7..... I think even this can be considered a pattern of 3 bytes where two bytes (aa & a7) are fixed and middle one varies from 00 to 05.
These are two examples that i could think of though there can be more patterns possibly. Even there may be some hidden patterns which can't be visualized immediately. The whole idea is any pattern will do as long as that helps to distinguish between two streams at a later stage. I think i am more clear now on specifying my problem. Please let me know the following things :
How can we do this type of pattern finding?
Are any tools or libraries which can help for this purpose?
Also which language or tool to use for efficient and faster development?
can the field of data mining help for this purpose ? If yes how to go ahead with that?