I have a large text file (400 MB) containing data in a format like so:
805625228 linked to 670103907:0.981545
805829325 linked to 901909901:0.981545
803485795 linked to 1030404117:0.981545
805865780 linked to 811300706:0.981545
ID linked to ID:Probability_of_link
...
...
....
...
...
The text file contains millions of such entries, and I have several such text files. As part of analyzing the data, I parse the data multiple times (each of the text files are in different formats). When parsing and working with the data in Python, I notice my memory usage shoot up to 3 GB at times.
What would be a better approach than dumping this data to text files? Could I store it in a json/sql database; and how much of a performance boost would it give me? What kind of database would be best suited to this data?
FYI, all the data shown above was produced from structured .csv files containing millions of rows.