Assume this is my dataframe. It's a sale data.
date date_block_num shop_id item_id item_price item_cnt_day
0 2013-01-02 0 59 22154 999.00 1.00
1 2013-01-03 0 25 2552 899.00 1.00
2 2013-01-05 1 25 2552 899.00 -1.00
3 2013-01-06 2 25 2554 1709.05 1.00
4 2013-01-15 2 28 2555 1099.00 1.00
5 2013-01-10 3 25 2564 349.00 1.00
6 2013-01-02 3 26 2565 549.00 1.00
7 2013-01-04 3 25 2572 239.00 1.00
8 2013-01-11 4 25 2572 299.00 1.00
9 2013-01-03 4 27 2573 299.00 3.00
So I'm trying to get all combinations or pairs of shop_id
, item_id
with respect to date_block_num
column like my code below.
matrix = []
for i in range(5):
sale = sales[sales.date_block_num==i]
matrix.append(np.array(list(itertools.product([i], sales.shop_id.unique(), sales.item_id.unique())), dtype='int16'))
df = pd.DataFrame(np.vstack(matrix)) #This works but it's slow.
Any help on writing this same code without loops.
I tried to do something like this but it's too slow and return memory error when I turn it into dataframe on my original dataset.
from itertools import product
df = pd.DataFrame(list(product(sales.date_block_num.unique(), sales.shop_id.unique(), sales.item_id.unique())))
Note: Original dataset have more than million rows.