1

I need to explode some columns in the below dataset:

Date    Borr    Year    Acc Grade
2020-06-30  borrower1   2001.0  ['acc1', 'acc2', 'acc3',...]    [3.6, 5.2, 3.8,...]
2020-06-30  borrower1   2001.0  ['acc1', 'acc2', 'acc3',...]    [3.6, 5.2, 3.8,...]
2020-06-30  borrower2   2005.0  ['acc23', 'acc21']  [6.6, 5.2]

I did as follows

new_df=df.apply(pd.Series.explode).reset_index()

but it still contains lists. For each account I have a grade. I have always used explode as in the example above (and to be honest it worked until a few days ago). Has anyone idea on the reason why it does not work anymore? I have already looked at other questions (e.g., Pandas explode multiple columns), so please do not treat this question as a duplicate, as it is not.

LdM
  • 674
  • 7
  • 23
  • What is `print (df['Grade'].head().tolist())` ? – jezrael Apr 09 '21 at 08:24
  • Thanks for your comments and suggestion, jezrael. It returns this list: ['[6.6, 5.2, 4.8, 3.1, 2.9]', '[6.6, 5.2, 4.8, 3.1, 2.9]', '[6.6, 2.2, 4.8, 3.1, 2.9]', '[33.7, 35.0, 31.9, 25.2, 25.9]', '[33.7, 35.0, 31.9, 25.2, 25.9]'] – LdM Apr 09 '21 at 08:27
  • I believe pandas does not have multi column explode – Deepak Apr 09 '21 at 08:28

2 Answers2

2

I think first is necessary convert columns to lists, because if not working explode there are strings:

 import ast
 df[['Acc', 'Grade']] = df[['Acc', 'Grade']].applymap(ast.literal_eval)

Or if possible some Nones is possible use:

import json
df[['Acc', 'Grade']] = df[['Acc', 'Grade']].applymap(json.loads)

If some data are broken is possible use:

def custom_conv(x):
    try:
        return ast.literal_eval(x)
    except:
        return []

import ast
df[['Acc', 'Grade']] = df[['Acc', 'Grade']].applymap(custom_conv)

EDIT:

For me working nice in pandas 1.2.3:

a =  ['[6.6, 5.2, 4.8, 3.1, 2.9]', '[6.6, 5.2, 4.8, 3.1, 2.9]', '[6.6, 2.2, 4.8, 3.1, 2.9]', '[33.7, 35.0, 31.9, 25.2, 25.9]', '[33.7, 35.0, 31.9, 25.2, 25.9]'] 
b = ["['acc1', 'acc2', 'acc3', 'acc4', 'acc5']", "['acc1', 'acc2', 'acc3', 'acc4', 'acc5']", "['acc1', 'acc2', 'acc3', 'acc4', 'acc5']", "['acc10', 'acc11', 'acc14', 'acc5', 'acc6']", "['acc10', 'acc11', 'acc14', 'acc5', 'acc6']"]
df = pd.DataFrame({'Year':[2001,2001,2005, 2006, 2007], 
                   'Acc':a,
                   'Grade':b})

print (df)
   Year                             Acc  \
0  2001       [6.6, 5.2, 4.8, 3.1, 2.9]   
1  2001       [6.6, 5.2, 4.8, 3.1, 2.9]   
2  2005       [6.6, 2.2, 4.8, 3.1, 2.9]   
3  2006  [33.7, 35.0, 31.9, 25.2, 25.9]   
4  2007  [33.7, 35.0, 31.9, 25.2, 25.9]   

                                         Grade  
0     ['acc1', 'acc2', 'acc3', 'acc4', 'acc5']  
1     ['acc1', 'acc2', 'acc3', 'acc4', 'acc5']  
2     ['acc1', 'acc2', 'acc3', 'acc4', 'acc5']  
3  ['acc10', 'acc11', 'acc14', 'acc5', 'acc6']  
4  ['acc10', 'acc11', 'acc14', 'acc5', 'acc6'] 

import ast
df[['Acc', 'Grade']] = df[['Acc', 'Grade']].applymap(ast.literal_eval)
 
df = df.apply(pd.Series.explode)
print (df.head(10))
   Year  Acc Grade
0  2001  6.6  acc1
0  2001  5.2  acc2
0  2001  4.8  acc3
0  2001  3.1  acc4
0  2001  2.9  acc5
1  2001  6.6  acc1
1  2001  5.2  acc2
1  2001  4.8  acc3
1  2001  3.1  acc4
1  2001  2.9  acc5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • @LdM - hmmm, what is `print (df['Acc'].head().tolist())` ? – jezrael Apr 09 '21 at 08:30
  • @LdM - hmmm, try last solution for broken json – jezrael Apr 09 '21 at 09:09
  • @LdM - What is `print (df['Acc'].head().tolist())` ? – jezrael Apr 09 '21 at 09:15
  • After applying the last solution:`[[], [], [], [], []]` . Before applying it: `["['acc1', 'acc2', 'acc3', 'acc4', 'acc5']", "['acc1', 'acc2', 'acc3', 'acc4', 'acc5']", "['acc1', 'acc2', 'acc3', 'acc4', 'acc5']", "['acc10', 'acc11', 'acc14', 'acc5', 'acc6']", "['acc10', 'acc11', 'acc14', 'acc5', 'acc6']"]` . I recently updated SO on my laptop. Might it be happen something on the versions of pandas/Jupyter Notebook that I am using? Previously the code in my question was working fine... Version is 1.1.3 – LdM Apr 09 '21 at 09:19
  • @LdM - Can you check data in edited asnwer? – jezrael Apr 09 '21 at 09:23
  • @LdM - hmmm, but it is my original answer, it working well in real data? – jezrael Apr 09 '21 at 09:28
  • Yes, it is working with real data. I do not know what happened. It was working fine with my code a few days ago, but yesterday I got lists instead of exploding obs. May I ask you how I could replace this code: `nodes=df['Acc'].explode().unique()` as you showed me? – LdM Apr 09 '21 at 09:30
  • 1
    @LdM - If working well then super ;) And `nodes=df['Acc'].explode().unique()` should working well after `df[['Acc', 'Grade']] = df[['Acc', 'Grade']].applymap(ast.literal_eval)` – jezrael Apr 09 '21 at 09:31
1

You can try this solution . As pointed on the link that you shared.,it says pandas does not have multi column explode . Try this out.It may help you

df.set_index('Acc').apply(pd.Series.explode).reset_index() df.set_index('Grade').apply(pd.Series.explode).reset_index()

Edited answer is below I believe there must be better solution for this.But you can check this code

import ast
df = pd.read_csv('/home/deepak/Desktop/data.txt',sep='\t')
df['Acc']=df['Acc'].apply(lambda s: list(ast.literal_eval(s)))
df['Grade'] = df['Grade'].apply(lambda s: list(ast.literal_eval(s)))
temp_df_1 = df[['Date', 'Borr', 'Year', 'Acc']].explode('Acc')
temp_df_1.reset_index(inplace=True,drop=True)
temp_df_2 = df[['Date', 'Borr', 'Year', 'Grade']].explode('Grade')
temp_df_2.reset_index(inplace=True,drop=True)
result_df = pd.concat([temp_df_1,temp_df_2],axis=1)
result_df = result_df.iloc[:,[0,1,2,3,7]]

input df

result_df

Deepak
  • 470
  • 1
  • 3
  • 15
  • Do you try solution from question? This is not problem here. – jezrael Apr 09 '21 at 08:33
  • Tested in apndas `1.2.3` and working nice OP solution (after convert strings columns to lists) – jezrael Apr 09 '21 at 08:35
  • Cool.... pandas.explode functionality is not available in the earlier version of pandas , last i checked – Deepak Apr 09 '21 at 08:38
  • I do not know why but it is still keeping data as lists in the columns :( Just put the columns at the beginning of the dataframe – LdM Apr 09 '21 at 09:06
  • 1
    I have made some changes to the answer.Hope that's your expected outcome – Deepak Apr 09 '21 at 09:24
  • Needless to mention please modify your separator and source path. I got the result by following few more question in stack overflow. – Deepak Apr 09 '21 at 09:28
  • Pandas does have [multicolumn explode](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html) it's just kinda strange syntax (and doesn't really work for me when I try to use it – lys Aug 19 '21 at 21:14