2

Consider the DataFrame data:

                 one  two  three  four
Ohio      2013-01-01    1      2     3
Colorado  2014-01-05    5      6     7
Utah      2015-05-06    9     10    11
New York  2016-10-11   13     14    15

I'd like to extract the row using only the criterion that the year is a given year, e.g., something like data['one'][:][0:4] == '2013'. But the command data['one'][:][0:4] returns

Ohio        2013-01-01
Colorado    2014-01-05
Utah        2015-05-06
New York    2016-10-11
Name: one, dtype: object

I thought this is the right thing to do because the command data['one'][0][0:4] returns

'2013'

Why the difference, and what's the correct way to do this?

bcf
  • 2,104
  • 1
  • 24
  • 43

3 Answers3

4

Since column 'one' consists of dates, it'd be best to have pandas recognize it as such, instead of recognizing it as strings. You can use pd.to_datetime to do this:

df['one'] = pd.to_datetime(df['one'])

This allows you to filter on date properties without needing to worry about slicing strings. For example, you can check for year using Series.dt.year:

df['one'].dt.year == 2013

Combining this with loc allows you to get all rows where the year is 2013:

df.loc[df['one'].dt.year == 2013, :]
root
  • 32,715
  • 6
  • 74
  • 87
0

The condition you are looking for is

df['one'].str[0:4] == "2013"

Basically, you need to tell Pandas to read your column as a string, then operate on the strings from that column.

The way you have it written (df['one'][:]), says "give me the column called "one", then give me all of them [:].

mprat
  • 2,451
  • 15
  • 33
0

query works out well too on datetime columns

In [13]: df.query('one == 2013')
Out[13]:
            one  two  three  four
Ohio 2013-01-01    1      2     3
Zero
  • 74,117
  • 18
  • 147
  • 154