I am searching for a way to import a csv file in python and let it shuffle all rows randomly and create a new csv file in which the rows are shuffled. I am not sure how to get this started. Anyone has some idea?
3 Answers
Read a csv file: use the stdlib csv
module.
Shuffle a list: use the stdlib random
module.
Write a csv file: use the stdlib csv
module.
Note that some csv formats (excel amongst others) allow for newlines within "cells", so it's safer to use the csv module. If you're 101% confident you'll never have such a csv format to deal with and need to speed up the code as much as possible, you could just read the file directly, but it's not really safe.
Also note that this will read the whole file in memory, so beware of huge csv files.

- 75,974
- 6
- 88
- 118
-
@BlueSheepToken cf my edit - and my comment on your answer ;-) – bruno desthuilliers Sep 03 '19 at 08:30
-
Yes thank you :), as you were right we do not need a doubled answer, I delted mine and upvoted yours ! – BlueSheepToken Sep 03 '19 at 08:32
-
Hey thanks, I try to code it myself like this: import csv import random with open('path1', 'r+') as r, open('path2', 'r+') as w: data = r.readlines() header, rows = data[0], data[1:] random.shuffle(rows) rows = '\n'.join([row.strip() for row in rows]) w.write(header + rows) it works but somehow the format is not correct. All table column are in the same one. I am not sure whether you meant that with what u wrote about the cells. I am not sure what to do. Thanks – Sendan21 Sep 03 '19 at 19:32
-
@Sendan21 code in comments is unreadable - either edit your question (keeping the existing one - just add to it), or post a new one. And please read this : [ask] – bruno desthuilliers Sep 04 '19 at 06:35
You can use pandas:
import pandas as pd
df = pd.read_csv(CSV_PATH)
x = df.sample(frac=1)
x.to_csv(NEW_CSV_PATH, index=False)
Edit: index=False
in the last line will avoid also writing an id column which pandas tends to add when you load a csv.
Regarding df.sample()
(from here) :
The frac
keyword argument specifies the fraction of rows to return in the random sample, so frac=1 means return all rows (in random order).

- 117
- 1
- 13
-
Unless the OP has a real use for Panda, I wouldn't suggest having such a huge dependency for something that can be done just as easily with stdlib's modules only. Panda is meant to do computations on tabular data, so using it only to read a csv is really overkill. – bruno desthuilliers Sep 03 '19 at 08:32
-
True, it also depends how much he will use this function. Is it something that needs to go into a module that needs to be efficient and fast, or does he "just" want to shuffle the csv file ? – Chris Post Sep 03 '19 at 08:37
-
Panda uses the stdlib's csv reader so wrt/ perfs, it's just some added overhead... – bruno desthuilliers Sep 03 '19 at 09:21
You can read the csv with the csv library into an array, then shuffle the array in a new one and write it back as a new csv. Or shuffle it directly into the array if you know the number of lines of the csv.

- 409
- 1
- 13
- 22