2

I have pandas dataframe that looks like the following.

The column props contain lists and the elements in the list is varying in length. I know the maximum number of entries in the list is 5. I also know that the list is ordered, i.e. I know that the second item always belong to the column with a specific header say "Tense" or "number". Then how can I convert each of the entry in lists into separate columns?

id  source   type   target          props                        subtype
2   wyrzucić    V   wyrzucisz       [FUT, 2, SG]                 NaN
6   śniadać     V   śniadać         [NFIN]                       NaN
7   bankrutować V   bankrutujący    [PST, ACT, PL, MASC, HUM]    PTCP
8   chwiać      V   będą chwiały    [FUT, 3, PL]                 NaN
23  dobyć       V   dobyłaś         [PST, 2, SG, FEM]            NaN

I have tried solutions with usntack() and also with tolist() methods. But the solutions do not work for the specific case.

Amrith Krishna
  • 2,768
  • 3
  • 31
  • 65
  • 1
    check the link ? https://stackoverflow.com/questions/35491274/pandas-split-column-of-lists-into-multiple-columns – BENY Nov 22 '17 at 05:27

3 Answers3

6

apply is usually slow. You can use

In [34]: df.join(pd.DataFrame(df.props.values.tolist()))
Out[34]:
   id                      props     0     1     2     3     4
0   2               [FUT, 2, SG]   FUT     2    SG  None  None
1   6                     [NFIN]  NFIN  None  None  None  None
2   7  [PST, ACT, PL, MASC, HUM]   PST   ACT    PL  MASC   HUM
3   8               [FUT, 3, PL]   FUT     3    PL  None  None
4  23          [PST, 2, SG, FEM]   PST     2    SG   FEM  None

Details

In [33]: df
Out[33]:
   id                      props
0   2               [FUT, 2, SG]
1   6                     [NFIN]
2   7  [PST, ACT, PL, MASC, HUM]
3   8               [FUT, 3, PL]
4  23          [PST, 2, SG, FEM]
Zero
  • 74,117
  • 18
  • 147
  • 154
1

You can try this UDF and see if it works -

def col_gen(x):
    props = x['props']
    for i in range(len(props)):
        x['Item'+str(i+1)] = props[i]
    return x

df = df.apply(lambda x: col_gen(x), axis=1)

This is taking every row, extracting the props column and appending it to additional columns

Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42
  • The solution is intuitive and great, and I was thinking why this didnt occur to me. Seems my pandas really got rustic. Thanks a lot for your time. – Amrith Krishna Nov 22 '17 at 05:25
1

Consider this simplified dataframe

df = pd.DataFrame({'id': [2,6,7,8,23], 'props': [['FUT', 2, 'SG'], ['NFIN'], ['PST', 'ACT', 'PL', 'MASC', 'HUM'], ['FUT', 3, 'PL'],['PST', 2, 'SG', 'FEM']]})

You can split the list column using

df[[1,2,3,4,5]] = df.props.apply(pd.Series)

You get

    id  props                       1       2   3   4       5
0   2   [FUT, 2, SG]                FUT     2   SG  NaN     NaN
1   6   [NFIN]                      NFIN    NaN NaN NaN     NaN
2   7   [PST, ACT, PL, MASC, HUM]   PST     ACT PL  MASC    HUM
3   8   [FUT, 3, PL]                FUT     3   PL  NaN     NaN
4   23  [PST, 2, SG, FEM]           PST     2   SG  FEM     NaN

Note: You can specify more relevant column names, I just used 1,2,3,4,5

Vaishali
  • 37,545
  • 5
  • 58
  • 86