I'm trying to create a script that looks through a list of strings files and reports on the sub-strings that are most common between them.
For example:
- Hello, I am string one. I like apples and oranges. We are all strings here.
- Hello, I am string two. I like apples and oranges. We are all strings here.
- Hello, I am string three. I like apples and oranges. We are all strings here.
- Hello, I am string four. I like apples and oranges. I like to express my individuality.
I'd like the script to tell me what are the common elements between the strings, above a certain threshold (for example, 5 characters).
Ideally I'd be told
- "I like apples and oranges" occurs in all files
- "Hello, I am string" occurs in all files
- "We are all strings here" occurs in three of the files.
If functions exist to do this in technologies I'm familiar with - SQL, Javascript, PHP, Ruby or Bash -I'll be extremely happy...
Many thanks,
Jack