selecting only relevant columns within a file - python

Question

Let's say I have a file 'test.csv' containing the following headers and data:

h1  c1  h2  h3  c2
1   0   2   3   1
3   0   2   1   0
0   1   2   3   3

What is the best option in python to only select and save the columns of interest and discard all others?

Assuming I'm only interested in saving the h columns, I thought of something along these lines:

f = open('test.csv')
s = save('new_test.csv', data = f, saveColumns=['h1','h2','h3'])´

n = load('new_test.csv')
print n

h1  h2  h3
1   2   3
3   2   1
0   2   3

This [post](http://stackoverflow.com/questions/16503560/read-specific-columns-from-csv-file-with-python-csv?answertab=active#tab-top) accomplishes something very similar. — Alex, Aug 08 '13 at 18:53
If no lumpy, I'd make a dictionary for each heading, and load the values as a list for the dictionary. Then at the end, just print out the valid entries. You will need a list of the dictionaries at the start to keep the column order. — Jiminion, Aug 08 '13 at 18:54
You can pass the data to python with awk and only give it the column you want. — Rocky Pulley, Aug 08 '13 at 18:55

score 1 · Answer 1 · answered Aug 08 '13 at 19:02

1

f = open("test.csv")
header = {i: x for i, x in enumerate(f.readline().split())}
columns = ('h1','h2','h3')
for l in f:
    print [x for i, x in enumerate(l.split()) if header[i] in columns]

answered Aug 08 '13 at 19:02

Roman Pekar

107,110
28
195
197

score 1 · Accepted Answer · answered Sep 07 '13 at 14:22

1

I found a very straightforward way of doing this:

import pandas as pd
selectColumns = ['h1','h2','h3']
table = pd.read_csv('test.csv')
tableNew = table[selectColumns]
pd.to_csv('tableNew')

answered Sep 07 '13 at 14:22

HappyPy

9,839
13
46
68

score 0 · Answer 3 · answered Aug 08 '13 at 18:58

0

>>> d=csv.DictReader(open("some.csv"),delimiter="\t")
>>> fields = ["h1","h2","h3"]
>>> new_rows = [[row[f] for f in fields] for row in d]
>>> d=csv.DictWriter(open("new_csv.csv","w"),fields,delimiter="\t")
>>> d.writerows(new_rows)

answered Aug 08 '13 at 18:58

Joran Beasley

110,522
12
160
179

selecting only relevant columns within a file - python

3 Answers3