1

I am having this dataset of data:

product Marketplace product_type
1 200 X
2 300 A
2 400 A
2 200 A
3 500 A
3 400 A
3 300 B

The expected output should be:

product Marketplace product_type
1 200 X
2 300 A
2 400 A
2 200 A
3 500 B
3 400 B
3 300 B

Basically, I'm changing the product type values if they differ for the same product. I tried the following code, but it works extremely hard for large amounts of data. Is there anything I could do about this or do you have any suggestions? What I have tried:

mp_correspondence = {200:1, 
                     300:2,
                     400:3,
                     500:4,
                    }
df['ranking'] = df['Marketplace'].map(mp_correspondence)
df
product_list = set(df['product'])
for i in product_list:
    df_product_frame = df[df['product'] == i].copy()
    nr_rows = df_product_frame['product'].count()
    if nr_rows > 1:
        df['product_type'] = (df.assign(ranking=df['Marketplace'].map(mp_correspondence)) \
                         .sort_values('ranking').groupby('product')
wwii
  • 23,232
  • 7
  • 37
  • 77
  • [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – wwii Mar 15 '22 at 13:34

1 Answers1

1

I don't fully understand your code but you can try the code below which gives the expected output.

Create a mapping between product and product_type columns by keeping the first product_type encountered.

mappings = df.drop_duplicates('product_type').set_index('product')['product_type']

df['product_type'] = df['product'].map(mappings)

Output:

>>> df
   product  Marketplace product_type
0        1          200            X
1        2          300            A
2        2          400            A
3        2          200            A
4        3          500            B
5        3          400            B
6        3          300            B

>>> mappings
product
1    X
2    A
3    B
Name: product_type, dtype: object
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Somehow, yes. But I need all the values from the product. I've created that mapping because I have more Marketplaces presented there. Basically what I've did in the code was to map those Marketplace IDs than to itterate trough all the set that contains more than 1 product and to replace the value if needed. – Ariana Negrea Mar 15 '22 at 13:44