5

My .csv-file is comma separated, which is the standard setting from read_csv.

This is working:

T1 = pd.DataFrame(pd.read_csv(loggerfile, header = 2)) #header contains column "1"

But as soon as I add something to DataFrame's constructor besides the read_csv, all my values are suddenly NaN. Why? How to solve this?

datetimeIdx = pd.to_datetime( T1["1"] )                #timestamp-column
T2 = pd.DataFrame(pd.read_csv(loggerfile, header = 2), index = datetimeIdx)
user2366975
  • 4,350
  • 9
  • 47
  • 87

1 Answers1

12

It's not necessary to wrap read_csv in a DataFrame call, as it already returns a DataFrame.

If you want to change the index, you can use set_index or directly set the index:

T1 = pd.read_csv(loggerfile, header = 2)
T1.index = pd.DatetimeIndex(T1["1"])

If you want to keep the column in the dataframe as a datetime (and not string):

T1 = pd.read_csv(loggerfile, header = 2)
T1["1"] = pd.DatetimeIndex(T1["1"])
T2 = T1.set_index("1", drop=False)

But even better, you can do this directly in read_csv (assuming the column "1" is the first column):

pd.read_csv(loggerfile, header=2, index_col=0, parse_dates=True)

The reason it returns a DataFrame with NaNs is because the DataFrame() call with a DataFrame as input will do a reindex operation with the provided input. As none of the labels in datetimeIdx are in the original index of T1 you get a dataframe with all NaNs.

joris
  • 133,120
  • 36
  • 247
  • 202
  • Is this answer possibly connected to another question of me? http://stackoverflow.com/questions/22655667/dataframe-correlation-produces-nan-although-its-values-are-all-integers And is it possible to keep the index-col as a copy in the dataframe? – user2366975 Mar 26 '14 at 08:55
  • My first suggestion (setting the index directly) will keep the column in the dataframe – joris Mar 26 '14 at 09:01
  • @user2366975 See that gray checkmark up and to the left? You might think about clicking it. – WestCoastProjects Sep 13 '21 at 20:38