2

I searched it and indeed I found a lot of similar questions but none of those seemed to answer my case.

I have a pd Dataframe which is a joined table consist of products and the countries in which they are sold. It's 3000 rows and 50 columns in size.

I'm uploading a photo (only part of the df) of the current situation I'm in now and the expected result I want to achieve.

enter image description here

I want to transpose the 'Country name' column into rows grouped by the 'Product code name. Please note that the new country columns are not limited to a certain amount of countries (some products has 3, some 40).

Thank you!

NsN
  • 109
  • 5
  • 1
    Hi @NsN I've now created a sample dataframe for you in my answer, but it is good practice to add a sample dataframe to your question, because this really helps others answering your question. Don't add a photo, add an example of a dataframe in text/code. Here you find explanation how this can be done: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Sander van den Oord Nov 12 '20 at 21:35
  • Thanks for the tip @SandervandenOord. I was struggling with uploading it as a code but I will take a look at what you've attached so I'll know better for next time. – NsN Nov 13 '20 at 10:51

2 Answers2

2

Use .cumcount() to count the number of countries that a product has.
Then use .pivot() to get your dataframe in the right shape:

df = pd.DataFrame({
    'Country': ['NL', 'Poland', 'Spain', 'Sweden', 'China', 'Egypt'],
    'Product Code': ['123', '123', '115', '115', '117', '118'],
    'Product Name': ['X', 'X', 'Y', 'Y', 'Z', 'W'],
})

df['cumcount'] = df.groupby(['Product Code', 'Product Name'])['Country'].cumcount() + 1

df_pivot = df.pivot(
    index=['Product Code', 'Product Name'], 
    columns='cumcount', 
    values='Country',
).add_prefix('country_')

Resulting dataframe:

    cumcount             country_1  country_2
ProductCode Product Name        
115         Y            Spain      Sweden
117         Z            China      NaN
118         W            Egypt      NaN
123         X            NL         Poland
Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
  • Looks good, thank you! I tried to run it but got a "ValueError: Length of passed values is 6, index implies 2" – NsN Nov 13 '20 at 10:55
1

Try this:

df_out = df.set_index(['Product code', 
                       'Product name', 
                       df.groupby('Product code').cumcount() + 1]).unstack()

df_out.columns = [f'Country_{j}' for _, j in df_out.columns]
df_out.reset_index()

Output:

  Product code Product name    Country_1 Country_2 Country_3
0       AAA115            Y       Sweden     China       NaN
1       AAA117            Z        Egypt    Greece       NaN
2       AAA118            W       France     Italy       NaN
3       AAA123            X  Netherlands    Poland     Spain

Details:

Reshape dataframe with set_index and unstack, using cumcount to create country columns. Then flatten multiindex header using list comprehension.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • 1
    Thanks, Scott! a neat solution. When trying to solve this problem before posting this post I tried using .unstack() and list comprehension but couldn't find a way to combine it together, as you did. – NsN Nov 13 '20 at 10:58