0

im getting confused with the data type of my pandas dataframe and dont know how to split my entries into several columns.

Data looks like:

       Name1                           Name2 
0  [0.1,0.2,0.3]     [{'label': 'Neutral',  'score': 0.60}]
1  [0.4,0.5,0.6]     [{'label': 'Negative', 'score': 0.60}]
2  [0.7,0.8,0.9]     [{'label': 'Positive', 'score': 0.60}]

The result should look like:

       Name1       N1    N2    N3                  Name2                    Label     Score
0  [0.1,0.2,0.3]  0.1   0.2   0.3   [{'label': 'Neutral','score': 0.60}]   Neutral    0.60
1  [0.4,0.5,0.6]  0.4.  0.5.  0.6   [{'label': 'Negative','score': 0.60}]  Negative   0.60
2  [0.7,0.8,0.9]  0.7   0.8   0.9   [{'label': 'Positive','score': 0.60}]  Positive   0.60

original sample

Not quite confident with python but i need to work with a large dataset of a fwe 100k entries.

Help much appreciated!

Best

  • The list only contains one dictionary? – Ynjxsjmh Oct 25 '22 at 17:45
  • [`df[['N1','N2','N3']] = pd.DataFrame(df.Name1.tolist(), index= df.index)`](https://stackoverflow.com/a/35491399/2221001) should take care of that list. [`df.join(pd.DataFrame(df.pop('Name2').values.tolist()))`](https://stackoverflow.com/a/63311361/2221001) I believe would do the job for that dictionary. – JNevill Oct 25 '22 at 17:45
  • @Ynjxsjmh : Yes only the one dic. – Laurence Bach Oct 25 '22 at 19:09

1 Answers1

-1

You can use to_list() function on a specific column to make columns out of list.

More of that you can find under this link:
https://datascienceparichay.com/article/split-pandas-column-of-lists-into-multiple-columns/

To do a similar thing with dict refer to this page:
https://stackoverflow.com/questions/38231591/split-explode-a-column-of-dictionaries-into-separate-columns-with-pandas

mlokos
  • 359
  • 2
  • 10
  • Thanks @mlokos : for the first table it worked perfect with `# new df from the column of lists split_df = pd.DataFrame(df['Values'].tolist(), columns=['v1', 'v2', 'v3']) # concat df and split_df df = pd.concat([df, split_df], axis=1) # display df df` – Laurence Bach Oct 25 '22 at 19:00