So, I am using a regular expression to search through a bunch of files from a corpus. The point is to find the titles of newspaper articles.
This is what I use:
cat *.txt | grep -P '(^[A-ZÖÄÜÕŠŽ].*[^\.]$)' --colour
It finds lines that begin with a capital, followed by any character, but not ending with a dot and that works for these specific files.
The problem is that two files interfere with each other and the dot from the very end of one file shows up in the beginning of another and I get this:
Kõik Kataria jüngrid kinnitavad , et nende elu on pärast naeruklubiga liitumist oluliselt paranenud .Kosmosepall teeb maailmareisi 39 kilomeetri kõrgusel.
Is there any way to prevent that interference without actually modifying the files or a way to change the regular expression, so that this dot at the beginning is excluded? I must say that I am a beginner, I tried to find solutions, but none of them were specific to my case.