1

I am trying to print a Pandas dataframe's columns to separate *.csv files in Python 2.7.

Using this code, I get a dataframe with 4 columns and an index of dates:

import pandas as pd
import numpy as np

col_headers = list('ABCD')
dates = pd.date_range(dt.datetime.today().strftime("%m/%d/%Y"),periods=rows)
df2 = pd.DataFrame(np.random.randn(10, 4), index=dates, columns = col_headers)
df = df2.tz_localize('UTC') #this does not seem to be giving me hours/minutes/seconds

I then remove the index and set it to a separate column:

df['Date'] = df.index
col_headers.append('Date') #update the column keys

At this point, I just need to print all 5 columns of the dataframe to separate files. Here is what I have tried:

for ijk in range(0,len(col_headers)):
    df.to_csv('output' + str(ijk) + '.csv', columns = col_headers[ijk])

I get the following error message:

KeyError: "[['D', 'a', 't', 'e']] are not in ALL in the [columns]"

If I say:

for ijk in range(0,len(col_headers)-1):

then it works, but it does not print the 'Date' clumn. That is not what I want. I need to also print the date column.

Questions:

  • How do I get it to print the 'Dates' column to a *.csv file?
  • How do I get the time with hours, minutes and seconds? If the number of rows is changed from 10 to 5000, then will the seconds change from one row of the dataframe to the next?

EDIT: - Answer for Q2 (See here) ==> in the case of my particular code, see this:

dates = pd.date_range(dt.datetime.today().strftime("%m/%d/%Y %H:%M"),periods=rows)
Community
  • 1
  • 1
edesz
  • 11,756
  • 22
  • 75
  • 123

1 Answers1

1

I don't quite understand your logic but the following is a simpler method to do it:

for col in df:
    df[col].to_csv('output' + col + '.csv')

example:

In [41]:

for col in df2:
    print('output' + col + '.csv')
outputA.csv
outputB.csv
outputC.csv
outputD.csv
outputDate.csv
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • EdChum: I'm presuming that you mean the logic in the for loop. Here's what I meant: the col_headers list contains all the column names. If I cycle through them, one at a time (through the loop), and output to a *.csv in each iteration, then I can manually specify which column to print in each iteration. To do that, I set up the loop counter to go from 0 (first list element) to range(0,len(col_headers)) which is the last loop element. Is there something in this counter specification that is blocking it from picking up the last list element (Date)? – edesz Nov 06 '14 at 14:50
  • 1
    the problem is you are generating a range from 0 to the len of the column list and using the index, which is a number, to index back into the df but those columns have names rather than an index so it'll fail. My answer is less verbose and a more intuitive way of achieving the same thing IMO – EdChum Nov 06 '14 at 15:00
  • EdChum: Yes, I see what you mean. Your method works - I just tested it and it answers my question. It definitely is simpler. However, what confuses me is why my method works with the "-1" (i.e. picking up column headers A, B, C, D) but does not work when I drop the "-1" (i.e. picking up column headers A, B, C, D, Date). This is the part that I just cannot understand??? – edesz Nov 06 '14 at 15:16
  • Yes, the columns do have names but, with my method, I am simply looping through the column names - this should work for all 5 columns, if it works for 4 columns. – edesz Nov 06 '14 at 15:23