0

I have a csv with 12288+1 coluns, and want to reduct to 4096+1 colums.

In this 12288+1 colums, they are same values on each three and the last value is a bit, 0 or 1.

I need to maintain a last value, and take just 1 for repetitive group of three.

And my original csv have 300 rows, or lines, whatever. I don't know how to do for catch others rows, and my script just take a first row/line.

from original csv 3,3,3,5,5,5,7,7,7,10,10,10 ... 20,20,20,50,50,50,1

want final csv 3,5,7,10 ... 20,50,1

import csv

count, num = 0
a = ''
with open('data.csv','rb') as filecsv:
    reader = csv.reader(filecsv)
    for row in reader:
        while count < 12290:
            a = a + str(row[:][count])+','
            count = count + 3
            num = num + 1
print num
print a

This prints just to have a idea.

Thanks for any help

  • Is this always groups of 3? Will there be groups of 2 (or 4) that you'll want to keep more than one of the same values? Will the same value appear more than once, and if so will you keep both values? – Rejected Apr 23 '14 at 19:26
  • I'm having a little hard time understanding the problem. You want to get the first 12990 values from a row, remove duplicates and then reduce *that* down to 4097 values? – msvalkon Apr 23 '14 at 19:30
  • Sorry, my explanation is very poor. Basically, i have my original csv, always in a sequence 3 repetitive elements, but I need just 1. The last, or the position 12289 is a bit 1 or 0, I need this too. This sequence of 3 elements are RGB color that I had a convertion for GRAY, so now, this is always same, so I want to do discard 2 and catch just 1. I have a csv with 300 rows of this (300 pictures) for 12288 (64x64 pixels) in RGB, so now, i want to do a csv with 4096+1 (64x64 pixels in grayscale) + 1 column of my bit 0 or 1 – MarkAngel11 Apr 23 '14 at 19:44

3 Answers3

0

If you don't mind using a library, Pandas will be able to do this for you nicely.

You can read a csv with pandas.read_csv. The use_cols parameter specifies which columns you want to keep, so you can use that to ignore these repeated columns.

columns = list(range(1,12288,3))
columns.append(12288)
data = pandas.read_csv('data.csv', usecols=columns)
data.to_csv('new_data.csv')
eboswort
  • 101
  • 1
  • 4
0

If they are always groups of three, just throw 2 away.

Group into groups of 3 like so:

>>> row=range(9)
>>> [row[i:i+3] for i in range(0,len(row),3)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8]]

However, this will give you groups of less than 3 at the end if row is not a multiple of 3:

>>> row=range(11)
>>> [row[i:i+3] for i in range(0,len(row),3)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
                                    ^  ^   only two elements...

If the number of elements may be a non multiple of 3, use zip. It will drop incomplete r,g,b groups:

>>> row=range(11)
>>> zip(*[iter(row)]*3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]

Then unpack into r,g,b components:

import csv

with open('data.csv','rb') as filecsv:
    reader = csv.reader(filecsv)
    for row in reader:
        for r, g, b in [row[i:i+3] for i in range(0,len(row),3)]:
            # use r or g or b, ignore the other two

If you are getting a ValueError you have a non multiple of 3 set of data (or csv is not parsing the data correctly) Try using zip as stated:

import csv

with open('data.csv','rb') as filecsv:
    reader = csv.reader(filecsv)
    for row in reader:
        for r, g, b in zip(*[iter(row)]*3):
            # use r or g or b, ignore the other two

(not tested...)

dawg
  • 98,345
  • 23
  • 131
  • 206
  • `Just error` does not help much. Is it a `ValueError` perchance? If so, use the zip method... – dawg Apr 23 '14 at 20:11
  • Ow, yes. The last group not 3 elements, just my unique column-bite 0 or 1. – MarkAngel11 Apr 23 '14 at 20:15
  • @MarkAngel11: related to the answer: [What is the most “pythonic” way to iterate over a list in chunks?](http://stackoverflow.com/q/434287/4279) – jfs Apr 23 '14 at 21:19
  • @dawg: I did this. Just a problem for now. When I write the r ou g or b in the new csv, I don't have breakline ('\n'). Every elements in a unique row... – MarkAngel11 Apr 24 '14 at 02:50
  • @MarkAngel11: Just add the `\n` where appropriate. Probably after the loop with `row` after the loop containing `r, g, b`. It sounds like you are adding the `\n` IN the loop with `r, g, b` so that it is added after every element... – dawg Apr 24 '14 at 04:25
0

To remove consecutive duplicates, you could use itertools.groupby function:

#!/usr/bin/env python
import csv
from itertools import groupby
from operator import itemgetter

with open('data.csv', 'rb') as file, open('output.csv', 'wb') as output_file:
    writer = csv.writer(output_file)
    for row in csv.reader(file):
        writer.writerow(map(itemgetter(0), groupby(row)))

It reads the input csv file and writes it to the output csv file with consecutive duplicates removed.

If there could be adjacent duplicate 0, 1 at the very end of the row then remove duplicates only in row[:-1] (all but last columns) and append the last bit row[-1] to the result if you want to preserve it:

from itertools import islice

no_dups = map(itemgetter(0), groupby(islice(row, len(row)-1)))
no_dups.append(row[-1])
writer.writerow(no_dups)
jfs
  • 399,953
  • 195
  • 994
  • 1,670