0

I have two python pandas dataframes, in simplified form they look like this:

DF1

+---------+------+
| Date 1  | Item |
+---------+------+
| 1991-08 |   A  |
| 1992-08 |   A  |
| 1997-02 |   B  |
| 1998-03 |   C  |
| 1999-02 |   D  | 
| 1999-02 |   D  |
+---------|------+

DF2

+---------+------+
| Date 2  | Item | 
+---------+------+
| 1993-08 |   A  | 
| 1993-09 |   B  | 
| 1997-01 |   C  |
| 1999-03 |   D  | 
| 2000-02 |   E  | 
| 2001-03 |   F  |
+---------|------+

I want to count how many element in Item column DF2 appeared in DF1 if the date in DF1 are less than the date in DF2

Desired Output

+---------+------+-------+
| Date 2  | Item | Count |
+---------+------+-------+
| 1993-08 |   A  |   2   |
| 1993-09 |   B  |   0   |
| 1997-01 |   C  |   0   |
| 1999-03 |   D  |   2   |
| 2000-02 |   E  |   0   |
| 2001-03 |   F  |   0   |
+---------+------+-------+

Appreciate any comment and feedback, thanks in advance

1 Answers1

3

Let's merge with a cartesian join and filter, then use value_counts and map back to your dataframe:

df_c = df1.merge(df2, on='Item')
df_c = df_c[df_c['Date 1'] < df_c['Date 2']]
df2['Count'] = df2['Item'].map(df_c['Item'].value_counts()).fillna(0)
print(df2)

Output:

    Date 2 Item  Count
0  1993-08    A    2.0
1  1993-09    B    0.0 # Note, I get no counts for B
2  1997-01    C    0.0
3  1999-03    D    2.0
4  2000-02    E    0.0
5  2001-03    F    0.0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187