0

I have multiple variable product in my csv. Assume I have an product which title "Car model145" and this "Car model145" have three different price and size. Now I want to expand price and color row with title. here is my data frame:

     title             price                       color                  image

  0  Car model145      2,54.00,852.00,2532.00      black,white,blue        car iamge url 
                       #three different price

I also have problem in price column. how to remove first comma after 2? so I can split price row properly. I also don't want to expand image row. The result will be look like this:

  title             price                       color                  image
0  Car model145      254.00                     black               car iamge url 
1  Car model145      852.00                     white  
2  Car model145      2532.00                    blue        
               
boyenec
  • 1,405
  • 5
  • 29
  • 3
    You can explore this pandas functionality [df.explode()](https://stackoverflow.com/questions/12680754/split-explode-pandas-dataframe-string-entry-to-separate-rows) – Agnij Sep 29 '21 at 12:58
  • Agnij I applied df.explode() but title row is not expanding properly and also I have problems in price column because I can't remove comma after two. `2,45.00` – boyenec Sep 29 '21 at 13:00
  • Is the comma issue a recurring one across the whole column (that too in the same pattern)? if not then manual removal can be considered. – Agnij Sep 29 '21 at 13:05
  • is `2,45.00` a typo? – Umar.H Sep 29 '21 at 13:13
  • my every price row like this `2,54.00,852.00,2532.00` there have an comma after every first number and I want to remove comma after every first number – boyenec Sep 29 '21 at 13:14

1 Answers1

2

Something confusing is the extra price (2,). Do you have this for all prices? You first need to get rid of it.

Then you can simply apply str.split and explode:

(df.assign(price=df['price'].str.replace(',', '', 1)) # remove first comma
   .apply(lambda s: s.str.split(',').explode())
   .assign(image=lambda d: d['image'].mask(d['image'].duplicated(), ''))
   .reset_index(drop=True)
 #  .to_csv('filename.csv')  # uncomment to save output as csv
)

output:

          title    price  color          image
0  Car model145   254.00  black  car iamge url
1  Car model145   852.00  white               
2  Car model145  2532.00   blue               
mozway
  • 194,879
  • 13
  • 39
  • 75
  • mozway Thanks. I see the result in my jupyter notebook but getting the first csv result when export the csv file. Should I put `inpalce = True` anywhere? – boyenec Sep 29 '21 at 13:38
  • mozway I am trying to export csv `data.to_csv('my_path/test1.csv')` but not getting the terminal result in csv. – boyenec Sep 29 '21 at 13:40
  • add `.to_csv('filename.csv')` before the last `)` (see update) – mozway Sep 29 '21 at 13:43
  • mozway can you please little bit explain about .mask functionality? what basically .mask doing here? – boyenec Sep 29 '21 at 13:46
  • You can check the doc for [`mask`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.mask.html), in summary, it replaces the matched rows with another value (here the empty string `''`) – mozway Sep 29 '21 at 13:55