I have a dataset of sales with 400K + of lines, and i need to execute a pivote below to bring for order line all SKUs in order, put SKUs in columns . I need make it for all orders, because after that i will create another table with these data.
However i get the error below:
ValueError: Unstacked DataFrame is too big, causing int32 overflow
This is the first time that i apply this method in a big dataset, and i will need to scale that for more bigest datasets.
This is my code.
import pandas as pd
import csv
from pandas import *
import os
import numpy as np
df1 = pd.read_csv('sales.csv')
df1 = df1.drop_duplicates()
df1.index=df1['ORDER_ID']
df3 = df1.assign(col=df1.groupby(level=0).SKU_ID.cumcount()).pivot(columns='col', values='SKU_ID').reset_index()
There are some way to execute that in ranges and concat that results? I still dont find way to do that.