I have a file that contain a set of a few thousand unique words/terms. It looks like:
high school teacher
high school student
library
pencil stand
college professor
college graduate
I need the list of all repeated patterns, so in this case I would need the following as the result:
high
school
high school
college
Is there any way in unix/vim we could achieve this?
Additional elaboration on requirement:
Q. Do the repeats have to be on a single line, or can they be split over several lines?
- Ideally, each pattern should be in a new line
Q. Are the patterns all word sequences (one or more words)
- Yes they are all word sequences
Q. Does spacing matter within a line? Capitalization? Punctuation?
- spaces and punctuations are all counted as part of the pattern. We can ignore capitalisation
ie.
School
==School
!=school
this pat.tern
==this pat.tern
!=this pattern