Delimiting and Stacking a Column in Pandas (Python 3.4)

Question

I am trying to delimit/parse phrases within the cells of a column based on white space. I am using the Panda Module in Python 3.4. So an example of what I am trying to do would turn this:

Keyword         Number    Row
Bob Jim Jon      300      2

Into this:

Keyword        Number     Row
Bob            300        2
Jim            300        2
Jon            300        2

I've been researching how to do this throughout the forums and stumbled upon this question which is very similar (and which won't let me comment on it directly to ask this question): pandas: How do I split text in a column into multiple rows?

Adapting the answer from that post I have created this code:

import pandas as pd
xl = pd.ExcelFile("C:/Users/j/Desktop/helloworld.xlsx")
df = xl.parse("HelloWorld")
df.head()
df1 = df[['Keyword','Number','Row']]
s = df1['Keyword'].str.split(' ').apply(Series, 1).stack()
s.index = s.index.droplevel(-1)
s.name = 'Keyword'
del df1['Keyword']
y = df1.join(s)
print(y)

However, when I try this I get the following error

s = df['Keyword'].str.split(' ').apply(Series, 1).stack()
NameError: name 'Series' is not defined

Any suggestions as to what I am doing wrong? Thank you!

You should do `pd.Series` instead of `Series`, as you imported pandas as `pd` — joris, Aug 21 '14 at 20:51
fantastic -- thank you! Please write it as a short answer so I can mark your answer as correct :) — user3682157, Aug 21 '14 at 20:53

score 0 · Answer 1 · answered Oct 08 '18 at 15:24

You can use apply(pd.Series) but this will be inefficient. Feeding np.repeat + itertools.chain to the pd.DataFrame constructor will give better performance:

import numpy as np
from itertools import chain

df = pd.DataFrame([['Bob Jim Jon', 300, 2]],
                  columns=['Keyword', 'Number', 'Row'])

split = df['Keyword'].str.split()
n = split.map(len)

res = pd.DataFrame({'Keyword': list(chain.from_iterable(split)),
                    'Number': np.repeat(df['Number'], n),
                    'Row': np.repeat(df['Row'], n)})

print(res)

  Keyword  Number  Row
0     Bob     300    2
0     Jim     300    2
0     Jon     300    2

Delimiting and Stacking a Column in Pandas (Python 3.4)

1 Answers1