I want to split a (in real life: huge) file into multiple files specified by, say, the second column in data. I.e. in the example below I need files 431.csv
and rr1.csv
.
My main idea was to open new connections to write if not already open - a record of open connections is in the dict files_dict
, and then iterate through this and close in the end.
I am stuck in how to refer to these connections line by line.
In real life the number and value of these file names (second column) is not known beforehand.
Found some inspiration here:
write multiple files at a time
python inserting variable string as file name
How can I split a text file into multiple text files using python?
Content of toy data in data_in
:
123,431,t
43,rr1,3
13,rr1,43
123,rr1,4
My naive pseudo-code as of now:
files_dict = dict() #dict of file names
with open(data_in) as fi:
for line in fi:
x = line.split(',')[1]
if x not in files_dict:
fo = x + '.csv'
files_dict[x] = fo
'''
open files_dict[x]
write line to files_dict[x]
'''
else:
'''
write line to files_dict[x]
'''
for fo in files_dict.fos:
fo.close()