1

I have a dataframe of following format:

dataframe format

the last column of my dataframe is last column

the hours, minutes and seconds data is completely irrelevant for my analysis work and i need to remove them. I found the following resources in stack-overflow but none of them seem to help.

Removing the timestamp from a datetime in pandas dataframe

I tried

pd.DatetimeIndex(df3).dt.date

but the error occurs Buffer has wrong number of dimensions (expected 1, got 2)

I tried dropping the H:M:S from only one column ['STDEV']

df3['STDEV']=df3['STDEV'].strftime("%m/%d/%Y")

but i get the error 'Series' object has no attribute 'strftime'

I also tried using apply()

df3.apply(pd.to_datetime(df3)).dt.date

which didnt work as well

Please let me know where i went wrong.Thanks in advance

Devarshi Goswami
  • 1,035
  • 4
  • 11
  • 26

1 Answers1

2

First you get timedeltas, not datetimes. So you can extract days for each column by Series.dt.days with DataFrame.apply:

df3 = pd.DataFrame({
        'TCTN':('101','102','103'),
         '0':"855 days,626 days,866 days".split(','),
         '1':"946 days,485 days,182 days".split(','),
         '2':"1242 days,1985 days,0 days".split(','),
         '3':"345 days,1864 days,361 days".split(',')
}).set_index('TCTN')
df3 = df3.iloc[:, 1:].apply(pd.to_timedelta)
print (df3)
            1         2         3
TCTN                             
101  946 days 1242 days  345 days
102  485 days 1985 days 1864 days
103  182 days    0 days  361 days

df3 = df3.apply(lambda x: x.dt.days)
print (df3)
        1     2     3
TCTN                 
101   946  1242   345
102   485  1985  1864
103   182     0   361
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252