-1

I have a simple script to either remove last n columns from csv file or to keep first n columns only in csv file:

from sys import argv
import csv

if len(argv) == 4:
  script, inputFile, outputFile, n = argv
  n = [int(i) for i in n.split(",")]
else:
  script, inputFile, outputFile = argv
  n = 1

with open(inputFile,"r") as fin:
  with open(outputFile,"w") as fout:
    writer=csv.writer(fout)
    for row in csv.reader(fin):
      writer.writerow(row[:n])

Example usage (remove last two columns): removeKeepColumns.py sample.txt out.txt -2

How do I extend this to handle possibility to keep/remove specific set of columns, e.g.:

  • remove columns 3,4,5
  • keep only columns, 1,4,6

I can split input arguments separted by comma into array, but don't know hot to pass this to writerow(row[])

Links to scripts I used to create my example:

Community
  • 1
  • 1
Tomas Greif
  • 21,685
  • 23
  • 106
  • 155
  • http://stackoverflow.com/questions/724856/picking-out-items-from-a-python-list-which-have-specific-indexes#724881 – Jasper Dec 07 '14 at 12:33
  • @Jasper I don't get it, would you mind to extend your comment a little? – Tomas Greif Dec 07 '14 at 12:46
  • If I understand you correctly, you are trying to get a (non-continuous) subsequence from a CSV. The linked question tells you exactly how to do that. – Jasper Dec 07 '14 at 12:52

2 Answers2

4

Well there was an accepted answer already, here's my solution:

>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="your_file.csv")
>>> sheet.column.select([1,4,5]) # the column indices to keep
>>> sheet.save_as("your_filtered_file.csv")
>>> exit()

Here is more details on filtering

chfw
  • 4,502
  • 2
  • 29
  • 32
2

Elaborating on my comment (Picking out items from a python list which have specific indexes)

from sys import argv
import csv

if len(argv) == 4:
  script, inputFile, outputFile, cols_str = argv
  cols = [int(i) for i in cols_str.split(",")]

with open(inputFile,"r") as fin:
  with open(outputFile,"w") as fout:
    writer=csv.writer(fout)
    for row in csv.reader(fin):
      sublist = [row[x] for x in cols]
      writer.writerow(sublist)

This should (untested) keep all the columns that are given as comma-separated list in the 3rd parameter. To remove the given colums,

sublist = [row[x] for x not in cols]

should do the trick.

Community
  • 1
  • 1
Jasper
  • 3,939
  • 1
  • 18
  • 35