8

I am trying to use the between_time function. I have formatted the string type time to datetime

dataset['TimeStamp'] = pd.to_datetime(dataset['TimeStamp'],format)

and I defined search start time and end time:

start = datetime.time(9,40,0)

end = datetime.time(10,00,0)

then I call dataset['TimeStamp'].between_time(start, end)

This is the error I get:

TypeError: Index must be DatetimeIndex

Please how can I fix it. Thank you

EdChum
  • 376,765
  • 198
  • 813
  • 562
lserlohn
  • 5,878
  • 10
  • 34
  • 52
  • 1
    So is your index a datetimeindex like the error suggests? – EdChum Jul 04 '14 at 14:51
  • I don't know, how to check that? Do I need to create a index for the dataframe? – lserlohn Jul 04 '14 at 15:14
  • Dataframes inherently always have an index, you can check the type by doing `type(df.index)` if you never set one it is likely to be an int64 auto generated one – EdChum Jul 04 '14 at 15:15
  • I read data like this: dataset = pd.read_csv('date.txt',header=0,delimiter=' '), what should I add? I see there is an index attribute, but I don't know how to use it. thanks – lserlohn Jul 04 '14 at 15:18
  • In that case pandas will auto generate it, you can always set it after loading data so in your case `dataset.set_index(keys='TimeStamp',inplace=True)` should work – EdChum Jul 04 '14 at 15:22
  • If that works could someone turn it into an answer? – holdenweb Jul 04 '14 at 20:34

1 Answers1

10

Example - I use info from comments:

import pandas as pd
import StringIO
import datetime

data = '''time --- value
1984-12-12 14:08:00 --- 1
1984-12-12 14:25:00 --- 2
1984-12-12 14:47:00 --- 4
1984-12-12 16:37:00 --- 3
1984-12-12 16:37:00 --- 9
1984-12-12 16:37:00 --- 5
1984-12-12 17:52:00 --- 3
1984-12-12 17:52:00 --- 7
1984-12-12 19:29:00 --- 2'''

#------------------------------------------------

df = pd.read_csv(StringIO.StringIO(data), sep=' --- ')

df['time'] = pd.DatetimeIndex(df['time'])

print "\nDataFrame:\n", df 

print '\nIndex:', type(df.index)

#------------------------------------------------

df.set_index(keys='time', inplace=True)

print "\nDataFrame:\n", df 

print '\nIndex:', type(df.index)

#------------------------------------------------

start = datetime.time(14,50,0)
end = datetime.time(18,0,0)

print "\nResult:\n", df['value'].between_time(start, end)

Results:

DataFrame:
                 time  value
0 1984-12-12 14:08:00      1
1 1984-12-12 14:25:00      2
2 1984-12-12 14:47:00      4
3 1984-12-12 16:37:00      3
4 1984-12-12 16:37:00      9
5 1984-12-12 16:37:00      5
6 1984-12-12 17:52:00      3
7 1984-12-12 17:52:00      7
8 1984-12-12 19:29:00      2

Index: <class 'pandas.core.index.Int64Index'>

DataFrame:
                     value
time                      
1984-12-12 14:08:00      1
1984-12-12 14:25:00      2
1984-12-12 14:47:00      4
1984-12-12 16:37:00      3
1984-12-12 16:37:00      9
1984-12-12 16:37:00      5
1984-12-12 17:52:00      3
1984-12-12 17:52:00      7
1984-12-12 19:29:00      2

Index: <class 'pandas.tseries.index.DatetimeIndex'>

Result:
time
1984-12-12 16:37:00    3
1984-12-12 16:37:00    9
1984-12-12 16:37:00    5
1984-12-12 17:52:00    3
1984-12-12 17:52:00    7
Name: value, dtype: int64
furas
  • 134,197
  • 12
  • 106
  • 148
  • For me this gets some odd bug, I have a dataframe with a 'time' column, I have converted it into a datetime64[ns] column, used your code, specified start (1990-01-01) and end (1990-12-31) dates and it returns rows with dates like 1957-09-02? The dates in my dataframe cover the 20th century (1900 to 1999). – imrek Sep 05 '15 at 18:46