I'm sure this has been answered elsewhere, but I can't think of how to exactly phrase it.
I am constructing a dataframe from orders. The way the csv is presented to me is, if the customer ordered more than 1 item, the following line in the csv is almost filled with NAs except for the Order Number and the item sku.
Order Number, Created at, Quantity, Sku, Tags, Bunch of other columns
11111, 1/29/2020, 1, XL-1234, Nan, ...
11112, 1/29/2020, 1, XL-1234, Wholesale, ...
11112, Nan, 2, SM-1254, Nan, ...
11113, 1/29/2020, 1, XL-1234, Nan, ...
df = pd.DataFrame({'Order Number':[11111, 11112, 11112, 11113],
'Created at':['1/29/2020', '1/29/2020', np.nan, '1/29/2020'],
'Quantity':[1,1,2,1],
'Sku':['XL-1234', 'XL-1234', 'SM-1254', 'XL-1234'],
'Tags':[np.nan, 'Wholesale', np.nan, np.nan]
})
print(df)
Order Number Created at Quantity Sku Tags
0 11111 1/29/2020 1 XL-1234 NaN
1 11112 1/29/2020 1 XL-1234 Wholesale
2 11112 NaN 2 SM-1254 NaN
3 11113 1/29/2020 1 XL-1234 NaN
What I need to do is fill the Nans of the same Order Number with the value that has been supplied, such as with a df.fillna(method='ffill', inplace=True).
The issue I keep running into is that it will fill all the Nans of a column with the most recent Non-Nan value.
Expected Result:
Order Number, Created at, Quantity, Sku, Tags, Bunch of other columns
11111, 1/29/2020, 1, XL-1234, Nan, ...
11112, 1/29/2020, 1, XL-1234, Wholesale, ...
11112, 1/29/2020, 2, SM-1254, Wholesale, ...
11113, 1/29/2020, 1, XL-1234, Nan, ...
df.fillna(method='ffill', inplace=True)
print(df)
Order Number Created at Quantity Sku Tags
0 11111 1/29/2020 1 XL-1234 NaN
1 11112 1/29/2020 1 XL-1234 Wholesale
2 11112 1/29/2020 2 SM-1254 Wholesale
3 11113 1/29/2020 1 XL-1234 Wholesale
I've tried to do things like:
for unique in df['Order Number'].unique():
df.loc[df['Order Number'] == unique].fillna(method='ffill', inplace=True)
I can't currently use a df.groupby (that I can think of) because some of the customers don't have a 'Wholesale' tag, but are special customers that I need to consider on their own.
My hope is to split the main dataframe into 4 dataframes. These would track all the sales (this isn't exactly its own dataframe), sales for Wholesale, sales for "special" customers, and sales for the rest of the customers.
If someone could point me in the right direction, or at least help me figure out how to search for this else where, I would be very grateful!