1

I have a variable that contains a string of:

fruit_wanted = 'banana,apple'

I also have a csv file

fruit,'orange','grape','banana','mango','apple','strawberry'
number,1,2,3,4,5,6
value,3,2,2,4,2,1
price,3,2,1,2,3,4

Now how do I delete the column in which the 'fruit' does not listed in the 'fruit_wanted' variable?

So that the outfile would look like

fruit,'banana','apple'
number,3,5
value,2,2
price,1,3

Thank you.

user1546610
  • 175
  • 5
  • 13
  • 1
    You should have either googled or searched Stackoverflow before posting. [Other question with proper answer](http://stackoverflow.com/questions/7588934/deleting-columns-in-a-csv-with-python) – hrr Nov 28 '12 at 21:43
  • Your csv file is sideways. This would be trivial if your csv had headers on first line `fruit,number,value,price` and then each line represented one fruit. – Steven Rumbalski Nov 28 '12 at 21:43
  • @StevenRumbalski: He may not have any control over that. It's useful to know how to deal with sideways CSV files (without having to read the whole thing in so you can `zip` it to transpose). – abarnert Nov 28 '12 at 21:45

2 Answers2

8

Read the csv file using the DictReader() class, and ignore the columns you don't want:

fruit_wanted = ['fruit'] + ["'%s'" % f for f in fruit_wanted.split(',')]
outfile = csv.DictWriter(open(outputfile, 'wb'), fieldnames=fruit_wanted)
fruit_wanted = set(fruit_wanted)

for row in csv.DictReader(open(inputfile, 'rb')):
    row = {k: row[k] for k in row if k in fruit_wanted}
    outfile.writerow(row)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • +1, except that you probably want to use a `csv.DictWriter` or `csv.Writer` rather than `print row`, or your output will be a dict's str representation rather than a comma-separated list in the right order… – abarnert Nov 28 '12 at 21:46
  • actually, author asked just for the 'outfile') – alexvassel Nov 28 '12 at 21:48
  • @alexvassel: 'outfile', I see now. So I updated the answer (and included a correction, the first `fruit` column is also needed). – Martijn Pieters Nov 28 '12 at 21:48
  • This only worked for me after I replaced fields=fruit_wanted with fieldnames=fruit_wanted – Alexis Eggermont Oct 29 '14 at 07:36
0

Here's some pseudocode:

open the original CSV for input, and the new one for output
read the first row of the original CSV and figure out which columns you want to delete
write the modified first row to the output CSV
for each row in the input CSV:
    delete the columns you figured out before
    write the modified row to the output CSV
abarnert
  • 354,177
  • 51
  • 601
  • 671