1

Consider two pd.Series(), t and T, where:

>>> print(t)

0       2006-04-17 00:00:00
1       2006-04-18 00:00:00
2       2006-04-19 00:00:00
3       2006-04-20 00:00:00
4       2006-04-21 00:00:00
               ...         
3522    2020-04-14 00:00:00
3523    2020-04-15 00:00:00
3524    2020-04-16 00:00:00
3525    2020-04-17 00:00:00
3526    2020-04-20 00:00:00
Name: tDate, Length: 3527, dtype: object

print(T)

0     2004-09-15
1     2004-10-20
2     2004-11-17
3     2004-12-22
4     2005-01-19
         ...    
203   2021-08-18
204   2021-09-15
205   2021-10-20
206   2021-11-17
207   2021-12-22
Name: tDate, Length: 208, dtype: datetime64[ns]

I would like to get a mat = np.array(len(t),len(T)) containing the difference in days between each element of t and all elements of T.

I tried np.array([_ - t for _ in T]) but this prints the Timestamps as int, while I need them as number of days between the two dates (also int tho). I also prefer to do it somehow in numpy so that it is faster.

np.array([_ - t for _ in T])
array([[ -50025600000000000,  -50112000000000000,  -50198400000000000,
        ..., -491788800000000000, -491875200000000000,
        -492134400000000000],
       [ -47001600000000000,  -47088000000000000,  -47174400000000000,
        ..., -488764800000000000, -488851200000000000,
        -489110400000000000],
       [ -44582400000000000,  -44668800000000000,  -44755200000000000,
        ..., -486345600000000000, -486432000000000000,
        -486691200000000000],
       ...,
       [ 489456000000000000,  489369600000000000,  489283200000000000,
        ...,   47692800000000000,   47606400000000000,
          47347200000000000],
       [ 491875200000000000,  491788800000000000,  491702400000000000,
        ...,   50112000000000000,   50025600000000000,
          49766400000000000],
       [ 494899200000000000,  494812800000000000,  494726400000000000,
        ...,   53136000000000000,   53049600000000000,
          52790400000000000]], dtype='timedelta64[ns]')
deblue
  • 277
  • 4
  • 18
  • You are looking for a cartesian product + some additional logic. This should help: https://stackoverflow.com/questions/53699012/performant-cartesian-product-cross-join-with-pandas – Shaido Oct 22 '20 at 08:19

1 Answers1

1

I think you need subtract with broadcasting with casting to days:

#if necessary
#T = T.to_numpy()
#t = t.to_numpy()

a = (T[:, None]  - t).astype('timedelta64[D]')
print (a)
[[ -579  -580  -581  -582  -583 -5690 -5691 -5692 -5693 -5696]
 [ -544  -545  -546  -547  -548 -5655 -5656 -5657 -5658 -5661]
 [ -516  -517  -518  -519  -520 -5627 -5628 -5629 -5630 -5633]
 [ -481  -482  -483  -484  -485 -5592 -5593 -5594 -5595 -5598]
 [ -453  -454  -455  -456  -457 -5564 -5565 -5566 -5567 -5570]
 [ 5602  5601  5600  5599  5598   491   490   489   488   485]
 [ 5630  5629  5628  5627  5626   519   518   517   516   513]
 [ 5665  5664  5663  5662  5661   554   553   552   551   548]
 [ 5693  5692  5691  5690  5689   582   581   580   579   576]
 [ 5728  5727  5726  5725  5724   617   616   615   614   611]]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This results in an error for me: AssertionError: ((208, 1), (3527, 1)) – deblue Oct 22 '20 at 09:25
  • What is `print (t[0])` and `print (T[0])` ? Are values converted to numpy array? – jezrael Oct 22 '20 at 09:27
  • 1
    ```print(t[0]) [Timestamp('2006-04-17 00:00:00')] print(T1[0]) 2004-09-15T00:00:00.000000000``` I guess both of them should be `numpy.datetime64`. I converted them and it works now. Thanks. – deblue Oct 22 '20 at 09:28