suppose I have 2 dataframes of the same shape. In the first column of each dataframe I have a product id (parent of variant items), in the following columns I have some data (pre-processed product features / numbers), which is different in the two dataframes, and in the last column I have the total amount of variant items pro product (children to parent product id)
first dataframe:
dfaa = pd.DataFrame([['id1', 1, 2, 3, 3], ['id2',4, 5, 6,6 ], ['id3', 7, 8, 9,9]], columns=['prod_id','a', 'b','c','number of prod variants'])
second dataframe:
dfbb = pd.DataFrame([['id1', 1.1, 2.2, 3.3, 3], ['id2',4.4, 5.4, 6.6,6 ], ['id3', 7.7, 8.8, 9.9,9]], columns=['prod_id','a', 'b','c','number of prod variants'])
What I need to do is join these dataframes to form one dataframe with a multiindex like this:
The first option would be an extra index level for each feature consisting of two columns on the lower level for the two values from 2 original dataframes. The second option I could think of is just concatenating the features along columns and then adding an extra index level which describes the numbers (non-NaN-values and unique values).
For the first option it could be neccessary to modify the names of the columns of the lower index level (e.g. instead of a and a I could work with a_vals and a_unique) - that would be no problem.
Trying real hard to get hold of working with data in python, I really appreciate your help.