I have a huge text file (4 GB), where each "line" is of the syntax:
[number] [number]_[number] [Text]
.
For example
123 12_14 Text 1
1234 13_456 Text 2
33 12_12 Text 3
24 678_10 Text 4
My purpose is to have this data saved as Excel file, where each "line" in the text file,
is a row in the excel file. According to the past example:
[A1] 123
[B1] 12_14
[C1] Text 1
[A2] 1234
[B2] 13_456
[C2] Text 2
[A3] 33
[B3] 12_12
[C3] Text 3
[A4] 24
[B4] 678_10
[C4] Text 4
My plan is to iterate the text "lines", as advised here, separate the "lines",
and save to the cells in an excel file.
Because of the text size issue, I thought to create many small excel files, which all together will be equal to the text file.
Than I need to analyze the small excel files, mainly found terms that where mentioned in the [Text]
cells, and count the number of apperance, related to the [number]
cells (representing a post and ID of a post).
Finally, I need to sum all this data in an excel file.
I'm considering the best way to create and analyze the excel files.
As mentioned here the main libraries are xlrd and csv.