Remove or keep specific columns in csv file

Question

I have a simple script to either remove last n columns from csv file or to keep first n columns only in csv file:

from sys import argv
import csv

if len(argv) == 4:
  script, inputFile, outputFile, n = argv
  n = [int(i) for i in n.split(",")]
else:
  script, inputFile, outputFile = argv
  n = 1

with open(inputFile,"r") as fin:
  with open(outputFile,"w") as fout:
    writer=csv.writer(fout)
    for row in csv.reader(fin):
      writer.writerow(row[:n])

Example usage (remove last two columns): removeKeepColumns.py sample.txt out.txt -2

How do I extend this to handle possibility to keep/remove specific set of columns, e.g.:

remove columns 3,4,5
keep only columns, 1,4,6

I can split input arguments separted by comma into array, but don't know hot to pass this to writerow(row[])

Links to scripts I used to create my example:

http://stackoverflow.com/questions/724856/picking-out-items-from-a-python-list-which-have-specific-indexes#724881 — Jasper, Dec 07 '14 at 12:33
@Jasper I don't get it, would you mind to extend your comment a little? — Tomas Greif, Dec 07 '14 at 12:46
If I understand you correctly, you are trying to get a (non-continuous) subsequence from a CSV. The linked question tells you exactly how to do that. — Jasper, Dec 07 '14 at 12:52

chfw · Answer 1 · 2016-10-24T17:15:16.840

4

Well there was an accepted answer already, here's my solution:

>>> import pyexcel as pe
>>> sheet = pe.get_sheet(file_name="your_file.csv")
>>> sheet.column.select([1,4,5]) # the column indices to keep
>>> sheet.save_as("your_filtered_file.csv")
>>> exit()

Here is more details on filtering

edited Oct 24 '16 at 17:15

answered Dec 07 '14 at 22:59

chfw

4,502
2
29
32

score 2 · Accepted Answer · edited May 23 '17 at 10:32

Elaborating on my comment (Picking out items from a python list which have specific indexes)

from sys import argv
import csv

if len(argv) == 4:
  script, inputFile, outputFile, cols_str = argv
  cols = [int(i) for i in cols_str.split(",")]

with open(inputFile,"r") as fin:
  with open(outputFile,"w") as fout:
    writer=csv.writer(fout)
    for row in csv.reader(fin):
      sublist = [row[x] for x in cols]
      writer.writerow(sublist)

This should (untested) keep all the columns that are given as comma-separated list in the 3rd parameter. To remove the given colums,

sublist = [row[x] for x not in cols]

should do the trick.

Remove or keep specific columns in csv file

2 Answers2