2

I have a csv file with 2 columns:

1 A
2 B
3 C
4 D

My aim is to use Python to open the file, read it, randomize the order of the two lists (i.e. have 1 be with the same line as C, 2 with D etc.), and then save the new randomized lists in a different csv file.

I read some more stuff about writer, but am unsure how to use these functions yet.

The only problem is that I need to keep the columns headers intact, they can't be randomized. The code was as follows:

import csv
import random

with open ("my_file") as f:
    l = list(csv.reader(f))

random.shuffle(l)

with open("random.csv", "W") as f:
    csv.writer(f).writerows(f)
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
HBS
  • 55
  • 2
  • 9
  • 2
    You've already broken the problem down nicely into individual steps. Try translating each step to code, and if you get stuck on a particular, show us the code you've got and explain exactly where you're stuck. – Marius Mar 26 '15 at 00:41
  • Take a look at this two links: https://docs.python.org/2/library/functions.html#open , http://stackoverflow.com/questions/976882/shuffling-a-list-of-objects-in-python – dgsleeps Mar 26 '15 at 00:45
  • Well, I've found a code that someone else used and tried to adjust it to my needs but it didn't go through very well. Now I'm trying to do it on my own and all I came up with so far is this (will edit my question above) – HBS Mar 26 '15 at 00:53

4 Answers4

3

You can read the rows as list, extract the two columns, then shuffle each one, then zip the columns together and finally write the result to a new csv file:

import csv
import random

with open("input.csv") as f:
    r = csv.reader(f)
    header, l = next(r), list(r)

a = [x[0] for x in l]
random.shuffle(a)

b = [x[1] for x in l]
random.shuffle(b)

with open("random.csv", "wb") as f:
    csv.writer(f).writerows([header] + zip(a, b))
JuniorCompressor
  • 19,631
  • 4
  • 30
  • 57
  • I don't think this does what OP is asking, they want to pair up the elements randomly, not just shuffle the order of the lines. – Marius Mar 26 '15 at 01:02
  • just out of curiosity, what's the *practical* difference in python between shuffling and randomizing? – HBS Mar 26 '15 at 01:18
  • hi @JuniorCompressor - thanks for the post. This code returns: "IndexError: list index out of range" – HBS Mar 26 '15 at 01:29
  • Sorry not sure what delimiter is. I have a total of 13 lines (including the headers), and 2 columns. – HBS Mar 26 '15 at 01:35
  • great! last question - the newly csv has an empty line between every line. Is there a way to make sure that all lines are being filled without empty spaces? – HBS Mar 26 '15 at 01:47
  • I really like how you split the header from the data `header, rows as next(r), list(r)`. I've been reading the whole list and splitting the headers manually in all of my code even though i knew the file readers were iterators and i could use `next` on iterators. Many lines of future code saved! Answer upvoted! – Haleemur Ali Mar 26 '15 at 02:00
  • It's always good to get upvoted from someone that you saved a lot of time – JuniorCompressor Mar 26 '15 at 02:13
  • @JuniorCompressor - How can I close the file? I need to use it later and I keep getting an error. I tried f.close() but that doesn't seem to work. – HBS Mar 26 '15 at 06:02
  • Using `with` automatically closes the file – JuniorCompressor Mar 26 '15 at 07:03
  • @HBS Its like the difference between swimming and the back stroke. Shuffling is just one type of "randomizing" (which could be stuff like picking random numbers.) Randomizing a list usually means shuffling. – PyRulez Mar 26 '15 at 11:36
  • @JuniorCompressor - so the problem is that I'm trying to open the file again and to to "write" in it. The code that follows is: 'import csv import operator sample=open('random'.csv, "wb") csv1=csv.reader(sample, delimiter=',') sort=sorted(csv1, key=operator.itemgetter(0)). The error that I'm getting is: IOError: File is not open for reading – HBS Mar 26 '15 at 13:30
0

HBS, the problem with your code is that it attempts to shuffle the row order, and not the columns individually.

You can read each column into separate lists, and then apply the shuffle, then combine the two lists together to form a list of rows before writing them to the output file.

To maintain the headers, after you have read the input file, pop the first element off the resulting list and then recombine after shuffling.

Here's the code to illustrate the steps:

import random
import csv

# read the data into lists
with open('input.csv', 'r') as myfile:
    csvreader = csv.reader(myfile, delimiter=' ')
    list1 = []
    list2 = []
    for row in csvreader:
        a, b = row
        list1.append(a)
        list2.append(b)

# pop the first element (headers)
title1, title2 = list1.pop(0), list2.pop(0)

# shuffle the list
random.shuffle(list1)
random.shuffle(list2)

# add the titles back: 
list1 = [title1] + list1
list2 = [title2] + list2

# write rows to output file
with open('output.csv', 'w') as oput:
    output_rows = list(zip(list1, list2))
    csvwriter = csv.writer(oput, delimiter=' ')
    csvwriter.writerows(output_rows)
JuniorCompressor
  • 19,631
  • 4
  • 30
  • 57
Haleemur Ali
  • 26,718
  • 5
  • 61
  • 85
0

Maybe not use the csv module. How about

Create two empty lists, one to hold the numbers and one to hold the letters.

Open the file,

For each line on the file

Split the line

Add the number to the numbers list

Add the letter to the letters list


Shuffle the numbers list

Take one item from each list, in sequence, and write them to a file

Repeat

The built-in function zip should help with that last bit.

wwii
  • 23,232
  • 7
  • 37
  • 77
0

Have a look at the source code of csvshuf:

reader = csv.reader(args.infile, delimiter=args.delimiter, quotechar=args.quotechar)

"""Get the first row and use it as column headers"""
headers = next(reader)

"""Create a matrix of lists of columns"""
table = []
for c in range(len(headers)):
    table.append([])
for row in reader:
    for c in range(len(headers)):
        table[c].append(row[c])

cols = args.columns

for c in cols:
    args.shuffle(table[c - 1])

"""Transpose the matrix"""
table = zip(*table)

writer = csv.writer(sys.stdout, delimiter=args.output_delimiter)
writer.writerow(headers)
for row in table:
    writer.writerow(row)
Pere
  • 1,647
  • 3
  • 27
  • 52