Read and write csv in python - big large

Question

I have a csv with 12288+1 coluns, and want to reduct to 4096+1 colums.

In this 12288+1 colums, they are same values on each three and the last value is a bit, 0 or 1.

I need to maintain a last value, and take just 1 for repetitive group of three.

And my original csv have 300 rows, or lines, whatever. I don't know how to do for catch others rows, and my script just take a first row/line.

from original csv 3,3,3,5,5,5,7,7,7,10,10,10 ... 20,20,20,50,50,50,1

want final csv 3,5,7,10 ... 20,50,1

import csv

count, num = 0
a = ''
with open('data.csv','rb') as filecsv:
    reader = csv.reader(filecsv)
    for row in reader:
        while count < 12290:
            a = a + str(row[:][count])+','
            count = count + 3
            num = num + 1
print num
print a

This prints just to have a idea.

Thanks for any help

Is this always groups of 3? Will there be groups of 2 (or 4) that you'll want to keep more than one of the same values? Will the same value appear more than once, and if so will you keep both values? — Rejected, Apr 23 '14 at 19:26
I'm having a little hard time understanding the problem. You want to get the first 12990 values from a row, remove duplicates and then reduce *that* down to 4097 values? — msvalkon, Apr 23 '14 at 19:30
Sorry, my explanation is very poor. Basically, i have my original csv, always in a sequence 3 repetitive elements, but I need just 1. The last, or the position 12289 is a bit 1 or 0, I need this too. This sequence of 3 elements are RGB color that I had a convertion for GRAY, so now, this is always same, so I want to do discard 2 and catch just 1. I have a csv with 300 rows of this (300 pictures) for 12288 (64x64 pixels) in RGB, so now, i want to do a csv with 4096+1 (64x64 pixels in grayscale) + 1 column of my bit 0 or 1 — MarkAngel11, Apr 23 '14 at 19:44

score 0 · Answer 1 · answered Apr 23 '14 at 19:31

0

If you don't mind using a library, Pandas will be able to do this for you nicely.

You can read a csv with pandas.read_csv. The use_cols parameter specifies which columns you want to keep, so you can use that to ignore these repeated columns.

columns = list(range(1,12288,3))
columns.append(12288)
data = pandas.read_csv('data.csv', usecols=columns)
data.to_csv('new_data.csv')

answered Apr 23 '14 at 19:31

eboswort

101
1
4

How maintain the columns (0, 3, 6, 9, 12, 15, ... , 12285, 12288, 12289)? I didn't understand... – MarkAngel11 Apr 23 '14 at 19:45

dawg · Answer 2 · 2014-04-23T20:13:03.290

0

If they are always groups of three, just throw 2 away.

Group into groups of 3 like so:

>>> row=range(9)
>>> [row[i:i+3] for i in range(0,len(row),3)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]

However, this will give you groups of less than 3 at the end if row is not a multiple of 3:

>>> row=range(11)
>>> [row[i:i+3] for i in range(0,len(row),3)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
                                    ^  ^   only two elements...

If the number of elements may be a non multiple of 3, use zip. It will drop incomplete r,g,b groups:

>>> row=range(11)
>>> zip(*[iter(row)]*3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]

Then unpack into r,g,b components:

import csv

with open('data.csv','rb') as filecsv:
    reader = csv.reader(filecsv)
    for row in reader:
        for r, g, b in [row[i:i+3] for i in range(0,len(row),3)]:
            # use r or g or b, ignore the other two

If you are getting a ValueError you have a non multiple of 3 set of data (or csv is not parsing the data correctly) Try using zip as stated:

import csv

with open('data.csv','rb') as filecsv:
    reader = csv.reader(filecsv)
    for row in reader:
        for r, g, b in zip(*[iter(row)]*3):
            # use r or g or b, ignore the other two

(not tested...)

edited Apr 23 '14 at 20:13

answered Apr 23 '14 at 19:55

dawg

98,345
23
131
206

`Just error` does not help much. Is it a `ValueError` perchance? If so, use the zip method... – dawg Apr 23 '14 at 20:11
Ow, yes. The last group not 3 elements, just my unique column-bite 0 or 1. – MarkAngel11 Apr 23 '14 at 20:15
@MarkAngel11: related to the answer: [What is the most “pythonic” way to iterate over a list in chunks?](http://stackoverflow.com/q/434287/4279) – jfs Apr 23 '14 at 21:19
@dawg: I did this. Just a problem for now. When I write the r ou g or b in the new csv, I don't have breakline ('\n'). Every elements in a unique row... – MarkAngel11 Apr 24 '14 at 02:50
@MarkAngel11: Just add the `\n` where appropriate. Probably after the loop with `row` after the loop containing `r, g, b`. It sounds like you are adding the `\n` IN the loop with `r, g, b` so that it is added after every element... – dawg Apr 24 '14 at 04:25

jfs · Answer 3 · 2014-04-23T21:12:07.413

To remove consecutive duplicates, you could use itertools.groupby function:

#!/usr/bin/env python
import csv
from itertools import groupby
from operator import itemgetter

with open('data.csv', 'rb') as file, open('output.csv', 'wb') as output_file:
    writer = csv.writer(output_file)
    for row in csv.reader(file):
        writer.writerow(map(itemgetter(0), groupby(row)))

It reads the input csv file and writes it to the output csv file with consecutive duplicates removed.

If there could be adjacent duplicate 0, 1 at the very end of the row then remove duplicates only in row[:-1] (all but last columns) and append the last bit row[-1] to the result if you want to preserve it:

from itertools import islice

no_dups = map(itemgetter(0), groupby(islice(row, len(row)-1)))
no_dups.append(row[-1])
writer.writerow(no_dups)

Read and write csv in python - big large

3 Answers3