I have the following code
dates = data['InvoiceDate'].unique()
c = data['StockCode'].unique()
r = pd.DataFrame([[dates], [c]]).T
r.columns=['InvoiceDate', 'StockCode']
r = r.explode(column='InvoiceDate')
r = r.explode(column='StockCode').reset_index(drop=True)
new_data = pd.merge(r, data, how='left', on=['InvoiceDate', 'StockCode'])
new_data['Quantity'] = new_data['Quantity'].fillna(0)
new_data['new_invoice'] = np.array(new_data['InvoiceDate'], dtype='datetime64[D]')
new_data.info()
I want to specify dtype of the column "new_invoice" as 'datetime64[D]'. Unfortunately it seems that it doesn't work, because I got the following output
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1241350 entries, 0 to 1241349
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 InvoiceDate 1241350 non-null datetime64[ns]
1 StockCode 1241350 non-null object
2 Quantity 1241350 non-null float64
3 UnitPrice 280451 non-null float64
4 new_invoice 1241350 non-null datetime64[ns]
dtypes: datetime64[ns](2), float64(2), object(1)
memory usage: 56.8+ MB
and
new_data['new_invoice'][0]
gives the following
Timestamp('2010-12-01 00:00:00')
What I want to see is
numpy.datetime64('2010-12-01')
Any ideas how to resolve it, please?