I have a grouped dataframe like so:
│ product │ category
│ spot1 spot2 spot3 │ spot1 spot2 spot3
──────────┼───────────────────────────┼─────────────────────────────
basket 1 │ NaN apple banana │ NaN fruits fruits
basket 2 │ almond carrot NaN │ nuts veggies NaN
One row represents a "basket" containing different food products (vegtables, fruits, nuts).
Each basket has 3 spots that can either contain a food product or not (=NaN).
I would like the first column of group product
to be as populated as possible. That means if there is a NaN value in the first column of the product group and some value in the 2nd or n-th column if should shift to the left for each group.
Categories are related: in the example above a baskets'
spot1
of group product
and spot1
of group category
belong together. Every data combination must have a value for product. If product is NaN then all the related items will be NaN as well.
The output should look something like:
│ product │ category
│ spot1 spot2 spot3 │ spot1 spot2 spot3
──────────┼───────────────────────────┼─────────────────────────────
basket 1 │ apple banana NaN │ fruits fruits NaN <-- this row shifted to left to "fill" first spot of product group
basket 2 │ almond carrot NaN │ nuts veggies NaN
jezrael's answer here was a good starting point for me:
#for each row remove NaNs and create new Series - rows in final df
df1 = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)
#if possible different number of columns like original df is necessary reindex
df1 = df1.reindex(columns=range(len(df.columns)))
#assign original columns names
df1.columns = df.columns
print (df1)
However, this solution ignores grouping. I only want values to shift left based on the specific group product
.
edit / minimal reproducible example
please use this code to get to the "starting point" of problem. The way I get to this point in my production code is more complex but this should do fine.
# Import pandas library
import pandas as pd
# initialize list of lists
data = [[1, 'NaN','NaN'], [1, 'apple','fruits'], [1,'banana', 'fruits'], [2, 'carrot','veggies'], [2, 'almond','nuts']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['basket','product', 'category'])
# print dataframe.
df
dfg = df.groupby(['basket', df.groupby(['basket']).cumcount() + 1]).first().unstack().reset_index()
print(dfg)