0

I have a pandas array that looks like this:

item cnn_features
a   [0.54168355 0.45831642]
b   [9.999999e-01   8.373661e-08 ]
c   [9.9934644e-01 6.5354136e-04]
d   [9.999999e-01 6.541346e-08]
e   [1.0000000e+00      2.0684617e-14  ]
f   [0.41258487 0.58741516]
g   [  7.337486e-15   1.000000e+00  ]

Please note: Most lists are 2 numbers with a space between them, but some of them have 2 or 3 spaces between them, and some of them have spaces in the beginning or end. This is likely why solutions that were suggested here: Pandas split column of lists into multiple columns did not work for me and raised various errors.

I want to split the second column into two column of floats:

item f1            f2
a    0.54168355    0.4583164
b    9.999999e-01  8.373661e-08
c    9.9934644e-01 6.5354136e-04
d    9.999999e-01  6.541346e-08
e    1.0000000e+00 2.0684617e-14
f    0.41258487    0.58741516
g    7.337486e-15  1.000000e+00

Tried different things with no luck... will appreciate any tips.

Braiam
  • 1
  • 11
  • 47
  • 78
lessreg
  • 59
  • 5

1 Answers1

1

Use DataFrame.pop for extract column, remove [], use Series.str.split and convert to floats:

df[['f1','f2']]=df.pop('cnn_features').str.strip('[]').str.split(expand=True).astype(float)
print (df)

  item            f1            f2
0    a  5.416836e-01  4.583164e-01
1    b  9.999999e-01  8.373661e-08
2    c  9.993464e-01  6.535414e-04
3    d  9.999999e-01  6.541346e-08
4    e  1.000000e+00  2.068462e-14
5    f  4.125849e-01  5.874152e-01
6    g  7.337486e-15  1.000000e+00
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252