How to split several column's data in pandas?

Question

I have a dataframe which looks like this:

df = pd.DataFrame({'hard': [['525', '21']], 'soft': [['1525', '221']], 'set': [['5245', '271']],  'purch': [['925', '201']], \
                  'mont': [['555', '621']], 'gest': [['536', '251']], 'memo': [['825', '241']], 'raw': [['532', '210']]})
df

Out:

    gest        hard        memo        mont        purch       raw         set         soft
0   [536, 251]  [525, 21]   [825, 241]  [555, 621]  [925, 201]  [532, 210]  [5245, 271] [1525, 221]

I should split all of the columns like this:

df1 = pd.DataFrame()
df1['gest_pos'] = df.gest.str[0].astype(int)
df1['gest_size'] = df.gest.str[1].astype(int)
df1['hard_pos'] = df.hard.str[0].astype(int)
df1['hard_size'] = df.hard.str[1].astype(int)
df1 

    gest_pos    gest_size   hard_pos    hard_size
0   536         251         525         21

I have more than 70 columns and my method takes lot of place and time. Is there an easier way to do this job?

Thanks!

jezrael · Answer 1 · 2018-08-02T08:04:25.757

2

You can use nested list comprehension with flattening and then create new DataFrame by constructor:

L = [[y for x in z for y in x] for z in df.values.tolist()]
#if want filter first 2 values per each list
#L = [[y for x in z for y in x[:2]] for z in df.values.tolist()]

#https://stackoverflow.com/a/45122198/2901002
def mygen(lst):
    for item in lst:
        yield item + '_pos'
        yield item + '_size'

df = pd.DataFrame(L, columns = list(mygen(df.columns))).astype(int)
print (df)
  hard_pos hard_size soft_pos soft_size set_pos set_size purch_pos purch_size  \
0      525        21     1525       221    5245      271       925        201   

  mont_pos mont_size gest_pos gest_size memo_pos memo_size raw_pos raw_size  
0      555       621      536       251      825       241     532      210

edited Aug 02 '18 at 08:04

answered Aug 02 '18 at 07:45

jezrael

822,522
95
1,334
1,252

Thanks @jezrael, how can I keep their order `..._pos, ..._size`? – M-M Aug 02 '18 at 07:49
@jezrael, you're good on pandas :) I've seen you at many pandas posts – M. Ali Öztürk Aug 02 '18 at 07:56
@M.AliÖztürk - Thank you. – jezrael Aug 02 '18 at 07:57
@jezrael How can I fillna `nan` with the value of first row for all columns in a dataframe? – M-M Aug 02 '18 at 09:33
Use `df.iloc[0] = np.nan` – jezrael Aug 02 '18 at 09:34
I have value for first row but not for other rows, so I want to do something like this `df_head.fillna(df_head.iloc[0])` but it does not work. – M-M Aug 02 '18 at 09:37
I understand now, need `df = df.ffill()` – jezrael Aug 02 '18 at 09:56
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/177268/discussion-between-m-m-and-jezrael). – M-M Aug 02 '18 at 10:07

score 2 · Accepted Answer · answered Aug 02 '18 at 07:55

2

Different approach:

df2 = pd.DataFrame()
for column in df:
    df2['{}_pos'.format(column)] = df[column].str[0].astype(int)
    df2['{}_size'.format(column)] = df[column].str[1].astype(int)
print(df2)

answered Aug 02 '18 at 07:55

RobJan

1,351
1
14
18

Tahnaks so much @RobJan ! – M-M Aug 02 '18 at 07:58

score 1 · Answer 3 · answered Aug 02 '18 at 08:22

You can use NumPy operations to construct your list of columns and flatten out your series of lists:

import numpy as np
from itertools import chain

# create column label array
cols = np.repeat(df.columns, 2).values
cols[::2] += '_pos'
cols[1::2] += '_size'

# create data array
arr = np.array([list(chain.from_iterable(i)) for i in df.values]).astype(int)

# combine with pd.DataFrame constructor
res = pd.DataFrame(arr, columns=cols)

Result:

print(res)

   gest_pos  gest_size  hard_pos  hard_size  memo_pos  memo_size  mont_pos  \
0       536        251       525         21       825        241       555   

   mont_size  purch_pos  purch_size  raw_pos  raw_size  set_pos  set_size  \
0        621        925         201      532       210     5245       271   

   soft_pos  soft_size  
0      1525        221

How to split several column's data in pandas?

3 Answers3