52

I'm writing a script to reduce a large .xlsx file with headers into a CSV, and then write a new CSV file with only the required columns based on the header names.

import pandas
import csv

df = pandas.read_csv('C:\\Python27\\Work\\spoofing.csv')

time = df["InviteTime (Oracle)"]
orignum = df["Orig Number"]
origip = df["Orig IP Address"]
destnum = df["Dest Number"]

df.to_csv('output.csv', header=[time,orignum,origip,destnum])

The error I'm getting is with that last bit of code, and it says

ValueError: Writing 102 cols but got 4 aliases

I'm sure I'm overlooking something stupid, but I've read over the to_csv documentation on the pandas website and I'm still at a loss. I know I'm misusing the to_csv parameters but I can't seem to get my head around the documentation.

Any help is appreciated, thanks!

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
Harrison Boles
  • 733
  • 2
  • 6
  • 9

2 Answers2

114

The way to select specific columns is this -

header = ["InviteTime (Oracle)", "Orig Number", "Orig IP Address", "Dest Number"]
df.to_csv('output.csv', columns = header)
Nikita Pestrov
  • 5,876
  • 4
  • 31
  • 66
user1827356
  • 6,764
  • 2
  • 21
  • 30
2
column_list=["column_name1", "column_name2", "column_name3", "column_name4"]

#filter the dataframe beforehand
ds[column_list].to_csv('output.csv',index=False)

#or use columns arg
ds.to_csv('output.csv', columns = column_list,index=False)

I provide index=False arg in order to write only column values