3

I have a pandas data frame like this:

df = pandas.DataFrame({
        'Grouping': ["A", "B", "C"], 
        'Elements': ['[\"A1\"]', '[\"B1\", \"B2\", \"B3\"]', '[\"C1\", \"C2\"]']
    }).set_index('Grouping')

so

            Elements
Grouping
===============================
A           ["A1"]
B           ["B1", "B2", "B3"]
C           ["C1", "C2"]

i.e. some lists are encoded as strings-as-lists. What is a clean way to reshape this into a tidy data set like this:

            Elements
Grouping
====================
A           A1
B           B1
B           B2
B           B3
C           C1
C           C2

without resorting to a for-loop? The best I can come up with is:

df1 = pandas.DataFrame()
for index, row in df.iterrows():
    df_temp = pandas.DataFrame({'Elements': row['Elements'].replace("[\"", "").replace("\"]", "").split('\", \"')})
    df_temp['Grouping'] = index
    df1 = pandas.concat([df1, df_temp])
df1.set_index('Grouping', inplace=True)

but that's pretty ugly.

lebelinoz
  • 4,890
  • 10
  • 33
  • 56
  • How does this come to be in the first-place? Anyway, use [this answer](https://stackoverflow.com/questions/32468402/how-to-explode-a-list-inside-a-dataframe-cell-into-separate-rows?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa) to "explode" the lists into the data-frame. But first, use `import ast` and `df['Elements'] = df.Elements.map(ast.literal_eval)`. Note that this will be slow, and you'll feel it on huge data-frames, but this is unavoidable unless you fix the situation that got you into this in the first place (I suspect this is an X-Y problem to begin with) – juanpa.arrivillaga May 09 '18 at 01:18

2 Answers2

2

You can use .str.extractall():

df.Elements.str.extractall(r'"(.+?)"').reset_index(level="match", drop=True).rename({0:"Elements"}, axis=1)

the result:

         Elements
Grouping         
A              A1
B              B1
B              B2
B              B3
C              C1
C              C2
HYRY
  • 94,853
  • 25
  • 187
  • 187
  • Nice one dude , never thought we can using `extactall` for these type of questions : -) – BENY May 09 '18 at 01:26
1

You can to convert your 'list' to list , then we doing apply with pd.Series and stack

import ast
df.Elements=df.Elements.apply(ast.literal_eval)
df.Elements.apply(pd.Series).stack().reset_index(level=1,drop=True).to_frame('Elements')
         Elements
Grouping         
A              A1
B              B1
B              B2
B              B3
C              C1
C              C2
BENY
  • 317,841
  • 20
  • 164
  • 234