1

I have two csv files:

csv1:

Mobile_Number    Date    

503477334    2018-10-12
506002884    2018-10-12
501022162    2018-10-12
503487338    2018-10-13
506012887    2018-10-13
503427339    2018-10-14

csv2:

   Date       Mobile_Number

2018-10-01     503477334
2018-10-06     501022162
2018-10-08     506002884
2018-10-09     503487338
2018-10-13     506012887
2018-10-14     503427492

Now. I want an output like the following, where there is a new column which indicates the difference between the dates of occurrence of the number in csv1, if the number exists in csv2

csv1:

Mobile_Number    Date     Difference

503477334    2018-10-12     11
506002884    2018-10-12     4
501022162    2018-10-12     6
503487338    2018-10-13     4
506012887    2018-10-13     0
503427339    2018-10-14     NaN
dpacman
  • 3,683
  • 2
  • 20
  • 35

1 Answers1

3

Use Series.map with subtract with Date column and then convert timedeltas by Series.dt.days:

df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])

s1 = df2.drop_duplicates('Mobile_Number').set_index('Mobile_Number')['Date']
df1['Difference'] = df1['Date'].sub(df1['Mobile_Number'].map(s1)).dt.days
print (df1)
   Mobile_Number       Date  Difference
0      503477334 2018-10-12        11.0
1      506002884 2018-10-12         4.0
2      501022162 2018-10-12         6.0
3      503487338 2018-10-13         4.0
4      506012887 2018-10-13         0.0
5      503427339 2018-10-14         NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Thank you for the quick response. But I'm getting the following error "InvalidIndexError: Reindexing only valid with uniquely valued Index objects" – dpacman Dec 22 '19 at 13:46
  • 1
    @dPac - I think problem is duplicates in `df2['Mobile_Number']`, answer was edited for remove duplicates. – jezrael Dec 22 '19 at 13:49
  • I'm getting this error now, TypeError: unsupported operand type(s) for -: 'datetime.date' and 'float' – dpacman Dec 23 '19 at 07:21
  • If check `print (df1.dtypes)` and `print (df2.dtypes)` before `df1['Difference'] = df1['Date'].sub(df1['Mobile_Number'].map(s1)).dt.days` ? What is it? – jezrael Dec 23 '19 at 07:24
  • For df1 and df2: Mobile _Number - object, Date - datetime64[ns] – dpacman Dec 23 '19 at 07:28
  • @dPac - hmmm, it seems some data related issue, what is `print (df1['Mobile_Number'].map(s1).dtype)` ? – jezrael Dec 23 '19 at 07:32
  • https://stackoverflow.com/q/59456738/10933598 Can you help me with this please @jezrael? – dpacman Dec 23 '19 at 14:11