Trying to parse string and create new columns in data frame in Python pandas

Question

I have the following data frame.

Team Opponent  Detail
Redskins Rams  Kirk Cousins .... Penaltyon Bill Smith, Holding:10 yards

What I want to do is create THREE columns using pandas which would give my the name (in this case Bill Smith), the type of infraction(Offensive holding), and how much it cost the team(10 yards). So it would look like this

Team      Opponent Detail Name       Infraction Yards
Redskins  Rams            Bill Smith  Holding   10 yards

I used some string manipulation to actually extract the fields out, but don't know how to create a new column. I have looked through some old columns, but cannot seem to get it to work. Thanks!

score 1 · Accepted Answer · edited May 23 '17 at 12:06

1

You function should return 3 values, such as...

def extract(r):
    return r[28:38], r[-8:], r[-16:-9]

First create empty columns:

df["Name"] = df["Infraction"] = df["Yards"] = ""

... and then cast the result of "apply" to a list.

df[["Name", "Infraction", "Yards"]] = list(df.Detail.apply(extract))

You could be interested in this more specific but more extended answer.

edited May 23 '17 at 12:06

Community

1
1

answered Oct 04 '15 at 16:19

Pietro Battiston

7,930
3
42
45

This creates the new columns. Now I have to clean up my extract method, which doesn't seem to be working. Thanks!!!!! – Kulwant Oct 04 '15 at 16:40

score 0 · Answer 2 · answered Oct 04 '15 at 16:30

In order to create a new column, you can simply do:

your_df['new column'] = something

For example, imagine you want a new column that contains the first word of the column Details

#toy dataframe
my_df = pd.DataFrame.from_dict({'Team':['Redskins'], 'Oponent':['Rams'],'Detail':['Penaltyon Bill Smith, Holding:10 yards ']})

#apply a function that retrieves the first word
my_df['new_word'] = my_df.apply(lambda x: x.Detail.split(' ')[0], axis=1)

This creates the a column that contains "Penaltyon"

Now, imagine I now want to have two new columns, one for the first word and another one for the second word. I can create a new dataframe with those two columns:

new_df =  my_df.apply(lambda x: pd.Series({'first':x.Detail.split(' ')[0],  'second': x.Detail.split(' ')[1]} ), axis=1)

and now I simply have to concatenate the two dataframes:

pd.concat([my_df, new_df], axis=1)

Trying to parse string and create new columns in data frame in Python pandas

2 Answers2