1

Came across this topic Save results to csv file with Python

All i needed - to write csv changes to file. BUT. This code stole some of my rows)) unreasonable amount(instead of five as in code)

Could you please explain why do they use collections in this simple action? And why is counter used here?

Use csv.writer:

import csv

with open('thefile.csv', 'rb') as f:
  data = list(csv.reader(f))

import collections
counter = collections.defaultdict(int)
for row in data:
    counter[row[0]] += 1


writer = csv.writer(open("/path/to/my/csv/file", 'w'))
for row in data:
    if counter[row[0]] >= 4:
        writer.writerow(row)
Bogdan Mind
  • 65
  • 1
  • 1
  • 8

2 Answers2

0

Not sure what is going on in the original code, but I think it reads the input and creates a dictionary of the data lines and assigns a counter to each line. So this wouldn't work correctly if there are two identical data lines btw. And then it uses this dictionary to determine which are the first 4 lines, in order to skip them when writing to the output (which is not a good use of dictionaries imho).

if counter[row[0]] >= 4:

The main thing is that it's an old answer from 11 years ago. Back then Python libraries like pandas for handling csv files weren't as commonly used. Now it's easier to just do something like this.

import pandas as pd

# semicolon separated and comma decimals, skip first 4 rows
df_in = pd.read_csv("oldfile.csv", delimiter=";", decimal=",", skiprows=(0,1,2,3))

# comma separated and point decimals
df_in.to_csv("newfile.csv", sep=",", decimal=".")
BdR
  • 2,770
  • 2
  • 17
  • 36
  • I am just in the beginning of my python journey. Should i learn panda right away or stick to csv module right now? Just finished my code with the csv – Bogdan Mind Aug 31 '21 at 10:55
  • @BogdanMind Are you new to programming in general, or just to Python? If you are new to programming in general, I'd recommend sticking to the simple stuff for the time being. Pandas is very powerful, but also quite big and complicated, and sometimes requires thinking in a different way from "straight" Python programming. but I'm sure you'll find differing opinions on this. :) – Ture Pålsson Aug 31 '21 at 11:06
  • Thank you so much for you kind and supportive answer! Yes, i am new to programming in general) – Bogdan Mind Aug 31 '21 at 11:24
0

This row

counter = collections.defaultdict(int)

creates a defaultdict where the values are integers, and the default value is zero. Then this bit

for row in data:
    counter[row[0]] += 1

scans the input, and counts how many times the value in the first field occurs in the file. Finally, the code outputs only those rows whose first value occurs at least 4 times. This could all be made a bit shorter by using collections.Counter instead, but I don't know whether that was in the standard library ten years ago.

Ture Pålsson
  • 6,088
  • 2
  • 12
  • 15
  • I'd point out that libraries - both for reading and writing CSV files - handle the various CSV dialects nicely. This is why I use them in any code I write. (Both [filterCSV](https://github.com/MartinPacker/filterCSV) and [mdpre](https://github.com/MartinPacker/mdpre) repos of mine do precisely this.) Yes, of course you could manually write CSV but why bother? (For reading you have less control over the format so I wouldn't even attempt it.) – Martin Packer Aug 31 '21 at 10:49
  • Thank you so much for clarification, but why you would possibly need this thing with “at least 4 times”? – Bogdan Mind Aug 31 '21 at 10:53
  • That's just what the program does. Why its author wanted it to do that is impossible to know. It is worth noticing that this is, as far as I can tell, totally irrelevant to the actual question in the post you linked to. The asker of that question did not do a very good job of coming up with a *minimal* example. – Ture Pålsson Aug 31 '21 at 10:59
  • Got it. Thank you very much for explanation! It was such a great experience asking a question here, so i already made another one)) Glad i signed up. – Bogdan Mind Aug 31 '21 at 13:03