1

I have strings with recurrent pattern. Like that:

/socket.io/1/xhr-polling/993bcoZK7UkiqsNYpbja?t=1407502307019
/socket.io/1/xhr-polling/993bcoZK7UkiqsNYpbja?t=2222222222222
/socket.io/1/xhr-polling/993bco56465456465465a?t=333333333333

And also thing like that:

/api/bucket/53e4ce6584df65130e7ead66/data/metadata
/api/bucket/465456456456465456445456/data/metadata
/api/bucket/898989898989898989898989/data/metadata

And much more like that.

How would be the best way to find patterns in these strings and to aggregate them ?

Like getting some sort of json:

{
   pattern : "/api/bucket/*/data/metadata"
   routes : ["/api/bucket/53e4ce6584df65130e7ead66/data/metadata",
             "/api/bucket/465456456456465456445456/data/metadata",
             "/api/bucket/898989898989898989898989/data/metadata"]
}
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Unitech
  • 5,781
  • 5
  • 40
  • 47
  • better now ? I want to get some sort of regex from string that match a common pattern – Unitech Aug 08 '14 at 20:44
  • I found something like this for Node.JS, a regex-trie that can generate a regex from strings. Now I try to find an easy way to find the matching probability of string then pass them to regex-trie. – Unitech Aug 09 '14 at 14:58

1 Answers1

-1

Ok found the solution, the Aho-Corasick matching algorithm referenced in the answer is not adapted. Levenshtein distance algorithm is. If found this module for Node.JS to do that: https://github.com/jefarmstrong/sortzzy

Unitech
  • 5,781
  • 5
  • 40
  • 47