-1

I have a directory which contains thousands of .txt files and sub folders with even more files. I need to run a C++ parsing program that goes into the main folder and run on every text file that is in the folder and its subfolders

So how should i proceed?

EDIT: So the question that has been linked gets the directory traversal part right but the main issue is that how can i "pass each text file" to my program in the directory

2 Answers2

0

If you can't use the boost-type solutions in that linked question, there's sample old-fashioned code for recursing through a directory structure here.

Bob Moore
  • 6,788
  • 3
  • 29
  • 42
0

I am assuming you are on Linux or some other POSIX system

You could use the find(1) command (as commented by Niels Keurentjes) to start a different process for every file. Since a new process would be started for each file, there is no significant limitation (but of course, starting a million processes take some time, even for very short lived processes; you might spend several milliseconds of starting time - or a few dozen of them - per process, plus the proper process time).

If each file processing is quick, you might want to avoid the overhead of starting a process for each of them. Then you'll do the recursive file tree scanning with the nftw(3) library function (and give it your handling function, which might be very quick and take a few microseconds if you are careful enough, and if the processing is simple & fast). AFAIK, it is able to handle very fat file trees.

Alternatively, use find to output a file list, and have your program parse that file list and process each file path in turn. Or embed some interpreter (like guile or lua ....) in your program, write a script to scan the directory, and have it call a function in your program for every file.

BTW, handling a fat file tree of several millions of files should not be a problem (and should be done in a reasonable time, a few minutes or hours; the bottleneck might be the disk I/O)

PS. See also the Answers table of this page for Approximate timing for various operations on some PC

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Will the "find" method work for 550k text files? – Newbie_Programmer Dec 17 '15 at 15:24
  • Of course yes, even a billion text files, if you have enough time – Basile Starynkevitch Dec 18 '15 at 14:31
  • Thanks it worked for me. For everyone that needs to run a program on multiple files use the "line method" as seen in the comments with a slight modification i.e. find -name "*.txt" -exec ./myParser {} \; notice the part before myParser And you'll need to use the command line arguments for passing the file you give to your parser/program – Newbie_Programmer Dec 20 '15 at 16:36