I have a large file (1.6 gigs) with millions of rows that has columns delimited with:
[||]
I have tried to use the csv module but it says I can only use a single character as a delimiter. So Here is what I have:
fileHandle = open('test.txt', 'r', encoding="UTF-16")
thelist = []
for line in fileHandle:
fields = line.split('[||]')
therow = {
'dea_reg_nbr':fields[0],
'bus_actvty_cd':fields[1],
'drug_schd':fields[3],
#50 more columns like this
}
thelist.append(therow)
fileHandle.close()
#now I have thelist which is what I want
And boom, now I have a list of dictionaries and it works. I want a list because I care about the order, and the dictionary because downstream it's being expected. This just feels like I should be taking advantage of something more efficient. I don't think this scales well with over a million rows and so much data. So, my question as follows:
What would be the more efficient way of taking a multi-character delimited text file (UTF-16 encoded) and creating a list of dictionaries?
Any thoughts would be appreciated!