1

I am not overly familiar with pandas so this may be a dumb question. I was trying to pivot the following data:

df = pd.DataFrame({
      'Country' : ['country1', 'country2', 'country3', 'country4'],
      'Industry' : ['industry1:\$20 \n industry4:\$30', 
                    'industry10:\$100', 
                    'industry3:\$2 \n industry4:\$30 \n industry12:\$10 \n industry1:\$3',
                    'industry1:\$20 \n industry4:\$30'
                   ],})

(the \n come from an excel extraction)

I need to pivot to have the industries as indices and countries as columns. My intuition is that I need to do some kind of "data unpacking" first on the cells that contain multiple information but I am at a loss at how to do it on pandas.


Thanks all. There are some answers below that work well. I continued to search and found some other posts related to this issue (some people call this issue "exploding pandas rows"). In the thread below, someone wrote a general function explode() that is generic and performs well:

Split (explode) pandas dataframe string entry to separate rows

nomore
  • 41
  • 5

1 Answers1

0

You can use:

  • set_index by all columns without Industry
  • split by regex \s+\n\s+ - \s+ is for 1 or more whitespaces
  • reshape by stack for Series
  • again split by different separator
  • double reset_index, first with remove first level
  • rename columns
df = (df.set_index(['Country'])['Industry']
        .str.split('\s+\n\s+', expand=True)
        .stack()
        .str.split(r':\\\$', expand=True)
        .reset_index(level=1, drop=True)
        .reset_index()
        .rename(columns={0:'Industry', 1:'Val'})
     )   
print (df)
    Country    Industry  Val
0  country1   industry1   20
1  country1   industry4   30
2  country2  industry10  100
3  country3   industry3    2
4  country3   industry4   30
5  country3  industry12   10
6  country3   industry1    3
7  country4   industry1   20
8  country4   industry4   30
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252