I am starting to learn the Perl language as it is very useful for my research. I cannot figure out how to extract a table from a text file
I have a folder with a certain number of text files named sequentially like this:
1.txt
2.txt
3.txt
...
...
1000.txt
An example of these files in .txt format can be found at the following link: http://www.sec.gov/Archives/edgar/data/1750/000104746909008102/0001047469-09-008102.txt
The .htm version of the same file can be found at the following link: http://www.sec.gov/Archives/edgar/data/1750/000104746909008102/a2194264zdef14a.htm
Now, the table I am looking for in these files is called sometimes:
Non-Qualified Deferred Compensation Table
some other with small variations like:
Non Qualified Deferred Compensation Table
Basically this table has the these words (sometimes they might slightly vary from file to file) in the headers:
- "Contributions"
- "Aggregate Earnings"
- "Aggregate Withdrawal/Distributions"
and other headers (with slight variations from file to file, but these words appear pretty much in every "Deferred Compensation Table" of each of my .txt files (have a look at the link to the .htm file and .txt file link for an example - search for "Non-Qualified Deferred Compensation Table" in the file). Under these headers, there are some amounts in dollars for a certain number of managers (number of table rows varies from file to file).
Is there a way to create a perl script that extract the deferred compensation table from each file and produces a .csv output with all deferred compensation tables (headers and numbers below) stored along with a reference for each table to the .txt file?
Something like this in the output file:
File Manager Name Contributions Aggregate Earnings Aggregate Withdrawal/Distributions
1.txt Manager1 00000 00000 00000
1.txt Manager2 00000 00000 00000
1.txt Manager3 00000 00000 00000
2.txt Manager1 00000 00000 00000
2.txt Manager2 00000 00000 00000
2.txt Manager3 00000 00000 00000
3.txt Manager1 00000 00000 00000
3.txt Manager2 00000 00000 00000
3.txt Manager3 00000 00000 00000
I would be most grateful if you could help me with this. I am new and I am trying to learn Perl, but this specific task seems honestly very hard for me.