I have an issue trying to approach this issue, there is a folder with 6000 text files. What I need is to find phrases that repeat across all these files and include it in a report. The issue goes beyond a regular
grep -Hl <phrase> Folder/*.txt
The issue is that I dont know the phrase to capture, is supposed to scan all documents and get 5 word segments and look around on the rest of the documents to find a match.
If there is a way that this can be achieved using python, I am all ears. I have think about NTLK or Machine Learning but would need more details about it.