Shuffle the rows of a csv file with python

Question

I am searching for a way to import a csv file in python and let it shuffle all rows randomly and create a new csv file in which the rows are shuffled. I am not sure how to get this started. Anyone has some idea?

bruno desthuilliers · Answer 1 · 2019-09-03T08:29:48.193

1

Read a csv file: use the stdlib csv module.

Shuffle a list: use the stdlib random module.

Write a csv file: use the stdlib csv module.

Note that some csv formats (excel amongst others) allow for newlines within "cells", so it's safer to use the csv module. If you're 101% confident you'll never have such a csv format to deal with and need to speed up the code as much as possible, you could just read the file directly, but it's not really safe.

Also note that this will read the whole file in memory, so beware of huge csv files.

edited Sep 03 '19 at 08:29

answered Sep 03 '19 at 08:24

bruno desthuilliers

75,974
6
88
118

@BlueSheepToken cf my edit - and my comment on your answer ;-) – bruno desthuilliers Sep 03 '19 at 08:30
Yes thank you :), as you were right we do not need a doubled answer, I delted mine and upvoted yours ! – BlueSheepToken Sep 03 '19 at 08:32
Hey thanks, I try to code it myself like this: import csv import random with open('path1', 'r+') as r, open('path2', 'r+') as w: data = r.readlines() header, rows = data[0], data[1:] random.shuffle(rows) rows = '\n'.join([row.strip() for row in rows]) w.write(header + rows) it works but somehow the format is not correct. All table column are in the same one. I am not sure whether you meant that with what u wrote about the cells. I am not sure what to do. Thanks – Sendan21 Sep 03 '19 at 19:32
@Sendan21 code in comments is unreadable - either edit your question (keeping the existing one - just add to it), or post a new one. And please read this : [ask] – bruno desthuilliers Sep 04 '19 at 06:35

score 0 · Answer 2 · answered Sep 03 '19 at 08:20

0

You can use pandas:

import pandas as pd
df = pd.read_csv(CSV_PATH)
x = df.sample(frac=1)
x.to_csv(NEW_CSV_PATH, index=False)

Edit: index=False in the last line will avoid also writing an id column which pandas tends to add when you load a csv.

Regarding df.sample() (from here) :

The frac keyword argument specifies the fraction of rows to return in the random sample, so frac=1 means return all rows (in random order).

answered Sep 03 '19 at 08:20

Chris Post

117
1
13

Unless the OP has a real use for Panda, I wouldn't suggest having such a huge dependency for something that can be done just as easily with stdlib's modules only. Panda is meant to do computations on tabular data, so using it only to read a csv is really overkill. – bruno desthuilliers Sep 03 '19 at 08:32
True, it also depends how much he will use this function. Is it something that needs to go into a module that needs to be efficient and fast, or does he "just" want to shuffle the csv file ? – Chris Post Sep 03 '19 at 08:37
Panda uses the stdlib's csv reader so wrt/ perfs, it's just some added overhead... – bruno desthuilliers Sep 03 '19 at 09:21

score 0 · Answer 3 · answered Sep 03 '19 at 08:23

0

You can read the csv with the csv library into an array, then shuffle the array in a new one and write it back as a new csv. Or shuffle it directly into the array if you know the number of lines of the csv.

answered Sep 03 '19 at 08:23

AndrewQ

409
1
13
22

Shuffle the rows of a csv file with python

3 Answers3