I have a project where I need to read data from a relatively large .txt file that contains 5 columns and about 25 million rows of comma-separted-data, process the data, and then write the processed data to a new .txt file. My computer freezes when I try to process a file this large.
I've already written the function to process the data and it works on small input .txt files, so I just need to adjust it to work with the larger file.
Here's an abridged version of my code:
import csv
import sys
def process_data(input_file, output_file):
prod_dict = {}
with open(input_file, "r") as file:
# some code that reads all data from input file into dictionary
# some code that sorts dictionary into an array with desired row order
# list comprehension code that puts array into desired output form
with open(output_file, 'w') as myfile:
wr = csv.writer(myfile)
for i in final_array:
wr.writerow(i)
def main():
input_file = sys.argv[1]
output_file = sys.argv[2]
process_data(input_file, output_file)
if __name__ == '__main__':
main()