I'd like to make a statistics of the words from all the txt
file from the current directory and its subdiretories.
In [39]: ls
about.txt distutils/ installing/ whatsnew/
bugs.txt extending/ library/ word.txt
c-api/ faq/ license.txt words_frequency.txt
contents.txt glossary.txt reference/
copyright.txt howto/ tutorial/
distributing/ install/ using
I firstly tried command:
In [46]: !grep -Eoh '[a-zA-Z]+' *.txt | nl
There's a problem that files in the subdiretories were not found:
In [45]: !echo *.txt
about.txt bugs.txt contents.txt copyright.txt glossary.txt license.txt word.txt words_frequency.txt
I improved it as:
In [48]: ! echo */*.txt | grep "about.txt"
In [49]:
Problem again, It failed to find the files of Level one directory and cannot traverse the files of random length.
It's interesting that python has a soluton to this problem:
In [50]: files = glob.glob("**/*.txt", recursive=True)
In [54]: files.index('about.txt')
Out[54]: 4
It could traverse dirs recursively to find all txt
files.
However, python is cumbersome to move around files and change text data as grep "pattern" *.txt
How to enable the wildcards as greedy as a recursive behavior.
As an alternative, find
command helps
find . -regex -E ".*\.txt" -print0 -exec grep -Eoh "{}" "[a-zA-Z]+" | nl \;
Which not handy as a greedy wildcards if possible.
The globstar
could not be activated on Macos.
$ shopt -s globstar
-bash: shopt: globstar: invalid shell option name
$ bash --version
GNU bash, version 4.4.19(1)-release (x86_64-apple-darwin17.3.0)