1

I have the following code

dates = data['InvoiceDate'].unique()
c = data['StockCode'].unique()
r = pd.DataFrame([[dates], [c]]).T
r.columns=['InvoiceDate', 'StockCode']
r = r.explode(column='InvoiceDate')
r = r.explode(column='StockCode').reset_index(drop=True)
new_data = pd.merge(r, data, how='left', on=['InvoiceDate', 'StockCode'])
new_data['Quantity'] = new_data['Quantity'].fillna(0)
new_data['new_invoice'] = np.array(new_data['InvoiceDate'], dtype='datetime64[D]')
new_data.info()

I want to specify dtype of the column "new_invoice" as 'datetime64[D]'. Unfortunately it seems that it doesn't work, because I got the following output

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1241350 entries, 0 to 1241349
Data columns (total 5 columns):
 #   Column       Non-Null Count    Dtype         
---  ------       --------------    -----         
 0   InvoiceDate  1241350 non-null  datetime64[ns]
 1   StockCode    1241350 non-null  object        
 2   Quantity     1241350 non-null  float64       
 3   UnitPrice    280451 non-null   float64       
 4   new_invoice  1241350 non-null  datetime64[ns]
dtypes: datetime64[ns](2), float64(2), object(1)
memory usage: 56.8+ MB

and

new_data['new_invoice'][0]

gives the following

Timestamp('2010-12-01 00:00:00')

What I want to see is

numpy.datetime64('2010-12-01')

Any ideas how to resolve it, please?

1 Answers1

1

As per this answer, you cannot store a datetime64[D] in a pandas dataframe. However, you can get out a datetime64[D] numpy array using the following:

new_invoice_dates = df["new_invoice"].values.astype('datetime64[D]')

which will be of type datetime64[D].

NCoop
  • 341
  • 3
  • 15