I've a file with 7946479 records, i want to read the file line by line and insert into the database(sqlite). My first approach was open the file read the records line by line and insert into the database at the same time, since it dealing with huge amount of data it taking very long time.I want to change this naive approach so when i searched on internet i saw this [python-csv-to-sqlite][1] in this they have the data in a csv file but the file i have is dat format but i like the answer to that problem so now i am trying to do it like in the solution.
https://stackoverflow.com/questions/5942402/python-csv-to-sqlite
The approach they using is like first they splitting the whole file into chunks then doing the database transaction instead of writing each record one at a time.
So i started writing a code for splitting my file into chunks Here is my code,
file = r'files/jan.dat'
test_file = r'random_test.txt'
def chunks(file_obj, size=10000):
counter = 0
file_chunks = []
temp_chunks = []
for line in file_obj:
if line == '\n':
continue
if counter != size:
temp_chunks.append(line)
counter += 1
else:
file_chunks.append(temp_chunks)
temp_chunks = []
counter = 0
file_obj.close()
if len(temp_chunks) != 0:
file_chunks.append(temp_chunks)
yield file_chunks
if __name__ == '__main__':
split_files = chunks(open(test_file))
for chunk in split_files:
print(len(chunk))
the output is 795, but what i wanted is to split the whole file into chunks of size 10000
i can't figure out what is going wrong here, i can't share my whole file here so for testing can use this code to generate a file with 7946479 lines
TEXT = 'Hello world'
FILE_LENGTH = 7946479
counter = 0
with open(r'random_test.txt', 'w') as f:
for _ in range(FILE_LENGTH):
f.write(f"{TEXT}\n")
this is how my original file looks like (the file format is dat)
lat lon day mon t2m rh2m sf ws
5 60 1 1 299.215 94.737 209.706 5.213
5 60.25 1 1 299.25 94.728 208.868 5.137
5 60.5 1 1 299.295 94.695 207.53 5.032
5 60.75 1 1 299.353 94.623 206.18 4.945
5 61 1 1 299.417 94.522 204.907 4.833
5 61.25 1 1 299.447 94.503 204.219 4.757
5 61.5 1 1 299.448 94.525 203.933 4.68
5 61.75 1 1 299.443 94.569 204.487 4.584
5 62 1 1 299.44 94.617 204.067 4.464