-1

I have multiple large csv file. How can I read part of each file and write 10% of the data/rows to another file?

user91
  • 365
  • 5
  • 14
  • There are too many ways to approach this problem. The answers would just turn into a straw-poll for which one people liked. The best thing is to do some research on the topic yourself, find two or three, _analyze_ them, determine if they work for you or not, and _try them out_. Come to us when you have a specific question about something you have attempted to do: [mcve]. – Patrick Artner Apr 19 '19 at 14:37
  • 1
    See f.e. https://stackoverflow.com/questions/22258491/read-a-small-random-sample-from-a-big-csv-file-into-a-python-data-frame – Patrick Artner Apr 19 '19 at 14:39
  • Thank you for the advice! I'll follow this approach. – user91 Apr 19 '19 at 14:41
  • Do you care which 10%? Or any random 10% will do? – John Gordon Apr 19 '19 at 14:47
  • I prefer either the first or last 10% part of the file. But I believe any random 10% will work, too. – user91 Apr 19 '19 at 14:51

1 Answers1

1

This works for me:

with open("in.csv") as infile, open("out.csv", "w") as outfile:
    outcsv = csv.writer(outfile)
    for i, row in enumerate(csv.reader(infile)):
        if not i % 10:
            outcsv.writerow(row)
brunns
  • 2,689
  • 1
  • 13
  • 24
  • Thanks! This works for the writing part. But I need to read the rest of the file in Panda. This is what I added `else: pd.read_csv(infile) ` But this error arises: `ValueError: Mixing iteration and read methods would lose data` How can I fix it? – user91 Apr 19 '19 at 16:44
  • That wasn't mentioned in the question. I suspect the `csv.reader` and `pd.read_csv` are going to get in one another way if you do that. – brunns Apr 19 '19 at 16:53
  • What does "i" refer to? – user91 Apr 19 '19 at 17:40
  • It's for index - the line number. – brunns Apr 19 '19 at 18:36