I have multiple large csv file. How can I read part of each file and write 10% of the data/rows to another file?
Asked
Active
Viewed 42 times
-1
-
There are too many ways to approach this problem. The answers would just turn into a straw-poll for which one people liked. The best thing is to do some research on the topic yourself, find two or three, _analyze_ them, determine if they work for you or not, and _try them out_. Come to us when you have a specific question about something you have attempted to do: [mcve]. – Patrick Artner Apr 19 '19 at 14:37
-
1See f.e. https://stackoverflow.com/questions/22258491/read-a-small-random-sample-from-a-big-csv-file-into-a-python-data-frame – Patrick Artner Apr 19 '19 at 14:39
-
Thank you for the advice! I'll follow this approach. – user91 Apr 19 '19 at 14:41
-
Do you care which 10%? Or any random 10% will do? – John Gordon Apr 19 '19 at 14:47
-
I prefer either the first or last 10% part of the file. But I believe any random 10% will work, too. – user91 Apr 19 '19 at 14:51
1 Answers
1
This works for me:
with open("in.csv") as infile, open("out.csv", "w") as outfile:
outcsv = csv.writer(outfile)
for i, row in enumerate(csv.reader(infile)):
if not i % 10:
outcsv.writerow(row)

brunns
- 2,689
- 1
- 13
- 24
-
Thanks! This works for the writing part. But I need to read the rest of the file in Panda. This is what I added `else: pd.read_csv(infile) ` But this error arises: `ValueError: Mixing iteration and read methods would lose data` How can I fix it? – user91 Apr 19 '19 at 16:44
-
That wasn't mentioned in the question. I suspect the `csv.reader` and `pd.read_csv` are going to get in one another way if you do that. – brunns Apr 19 '19 at 16:53
-
-