-1

I have a question about list comprehension using timestamps as the condition.

I have the following dataframe:

    Period  PriceIndex  Inflation           year    month   date
0   1920/01 19.3        0.0                 1920    01      1920-01-01
1   1920/02 19.5        1.03093696588612    1920    02      1920-02-01
2   1920/03 19.7        1.020417017424169   1920    03      1920-03-01
3   1920/04 20.3        3.0002250303799105  1920    04      1920-04-01
4   1920/05 20.6        1.467018974779366   1920    05      1920-05-01
5   1920/06 20.9        1.4458083175229675  1920    06      1920-06-01
6   1920/07 20.8        -0.4796172263492604 1920    07      1920-07-01
7   1920/08 20.3        -2.433210065953073  1920    08      1920-08-01
8   1920/09 20.0        -1.4888612493750841 1920    09      1920-09-01
9   1920/10 19.9        -0.5012541823544048 1920    10      1920-10-01
10  1920/11 19.8        -0.5037794029957077 1920    11      1920-11-01
11  1920/12 19.4        -2.0408871631207415 1920    12      1920-12-01

I would like to have a list of True values for every date after 1920-07-01. Before this date the list should contain False. So something like this:

0 bool False
1 bool False
2 bool False
3 bool False
4 bool False
5 bool True
6 bool True

To obtain this I use list comprehension:

v = [i>='1920-07-01 00:00:00' for i in df['date']]

However, I always get this type error: TypeError: '>=' not supported between instances of 'Timestamp' and 'str'.

Does anyone have an idea what I could do about this? Thank you in advance :)

1 Answers1

1

In the expression i >= '1920-07-01 00:00:00', the type of i is Timestamp while '1920-07-01 00:00:00' is a literal str. The error very explicitly tells you that >= is not a supported operation between these two types. In other words you need to turn '1920-07-01 00:00:00' into a Timestamp object in order to compare the two.

Using boolean masking is much more efficient than a comprehension, especially for larger DataFrames:

df["date"] >= pd.Timestamp("1920-07-01 00:00:00")

This operation will result in a Series object. You can then convert that to a list if that's the type of object you want.


If for whatever reason you have your heart set on using a comprehension:

v = [d >= pd.Timestamp("1920-07-01 00:00:00") for d in df["date"]]
ddejohn
  • 8,775
  • 3
  • 17
  • 30