3

I am trying to delimit/parse phrases within the cells of a column based on white space. I am using the Panda Module in Python 3.4. So an example of what I am trying to do would turn this:

Keyword         Number    Row
Bob Jim Jon      300      2

Into this:

Keyword        Number     Row
Bob            300        2
Jim            300        2
Jon            300        2

I've been researching how to do this throughout the forums and stumbled upon this question which is very similar (and which won't let me comment on it directly to ask this question): pandas: How do I split text in a column into multiple rows?

Adapting the answer from that post I have created this code:

import pandas as pd
xl = pd.ExcelFile("C:/Users/j/Desktop/helloworld.xlsx")
df = xl.parse("HelloWorld")
df.head()
df1 = df[['Keyword','Number','Row']]
s = df1['Keyword'].str.split(' ').apply(Series, 1).stack()
s.index = s.index.droplevel(-1)
s.name = 'Keyword'
del df1['Keyword']
y = df1.join(s)
print(y)

However, when I try this I get the following error

s = df['Keyword'].str.split(' ').apply(Series, 1).stack()
NameError: name 'Series' is not defined 

Any suggestions as to what I am doing wrong? Thank you!

jpp
  • 159,742
  • 34
  • 281
  • 339
user3682157
  • 1,625
  • 8
  • 29
  • 55

1 Answers1

0

You can use apply(pd.Series) but this will be inefficient. Feeding np.repeat + itertools.chain to the pd.DataFrame constructor will give better performance:

import numpy as np
from itertools import chain

df = pd.DataFrame([['Bob Jim Jon', 300, 2]],
                  columns=['Keyword', 'Number', 'Row'])

split = df['Keyword'].str.split()
n = split.map(len)

res = pd.DataFrame({'Keyword': list(chain.from_iterable(split)),
                    'Number': np.repeat(df['Number'], n),
                    'Row': np.repeat(df['Row'], n)})

print(res)

  Keyword  Number  Row
0     Bob     300    2
0     Jim     300    2
0     Jon     300    2
jpp
  • 159,742
  • 34
  • 281
  • 339