Given the following csv file :
01;blue;brown;black
02;glass;rock;paper
03;pigeon;squirel;shark
My goal is to replace the (unique) line containing '02' in the 1st posisition.
I wrote this piece of code:
with open("csv", 'r+', newline='', encoding='utf-8') as csvfile, open('csvout', 'w', newline='', encoding='utf-8') as out:
reader = csv.reader(csvfile, delimiter=';')
writer = csv.writer(out, delimiter=';')
for row in reader:
if row[0] != '02':
writer.writerow(row)
else:
writer.writerow(['02', 'A', 'B', 'C'])
But re-writing the whole CSV in an other doesn't seem to be the most efficient way to proceed, especially for large files:
- Once the match is found, we continue to read till the end.
- We have to re-write every line one by one.
- Writing a second file isn't very practical nor is storage efficient.
I wrote a second piece of code who seems to answer to these two problems :
with open("csv", 'r+', newline='', encoding='utf-8') as csvfile:
content = csvfile.readlines()
for index, row in enumerate(content):
row = row.split(';')
if row[2] == 'rock':
tochange = index
break
content.pop(tochange)
content.insert(tochange, '02;A;B;C\n')
content = "".join(content)
csvfile.seek(0)
csvfile.truncate(0) # Erase content
csvfile.write(content)
Do you agree that the second solution is more efficient ? Do you have any improvement, or better way to proceed ?
EDIT : The number of character in the line can vary.
EDIT 2 : I'm apparently obliged to read and rewrite everything, if I don't want to use padding. A possible solution would be a database-like solution, I will consider it for the future.
If I had to choose between those 2 solutions, which one would be the best performance-wise ?