Python Pandas: Split column and add new column next to current

Question

I have a excel sheet similar to this, although with a lot more columns:

Team	Members
Team1 (553)	95435
Team2 (443)	872

I want to split the team column into Team and a new column, named Team ID. I currently do this with the following code:

df[['Team', 'Team ID']] = df['Team'].str.split(r"\s\(+(?=\S*$)", expand=True)
df['Team ID'] = df['Team ID'].str[:-1]

This works fine (note that Team name can include numbers, spaces and paranthesis). So while this might not be perfect, I gets the job done.

My issue is that the new column, "Team ID" is placed at the end of the dataset. So it would be "Team - Members - Team ID". While not an issue with 3 columns, sometimes there is 10 columns where 7 needs to be split.

So the question: Is there any way to split a column in 2, and place the newly created column next to the old one?

@jezrael not sure this is a dupe, one can insert directly on the correct spot — mozway, Feb 04 '22 at 09:27
@mozway - I think using list of columns names for last ordering in correct way. — jezrael, Feb 04 '22 at 09:28
Or using `df.insert(df.columns.get_loc('Team')+1, 'Team ID', df.pop('Team ID'))` should working — jezrael, Feb 04 '22 at 09:30

mozway · Accepted Answer · 2022-02-04T09:24:19.917

2

You can use str.extract with a regex.

To insert on the correct position you could use insert:

out = df['Team'].str.extract('(\w+) \((\d+)\)')

df['Team'] = out[0]
df.insert(df.columns.get_loc('Team')+1, 'Team ID', out[1])

output:

    Team  Team ID  Members
0  Team1      553    95435
1  Team2      443      872

regex:

(\w+)      # match word
\((\d+)\)  # match digits surrounded by parentheses

edited Feb 04 '22 at 09:24

answered Feb 04 '22 at 09:18

mozway

194,879
13
39
75

How would that regex work if the Team name is "Team1 (Boston)"? Or even "Team1 (99)" followed by ID? – suprimos Feb 04 '22 at 09:27
can you provide exhaustive example of full names? If you really want to match anything initially and know that the ID is the last thing, use an anchor `'(.*) $(\d+)$$'` – mozway Feb 04 '22 at 09:31
Team might have been over simplified. An example of a string could be "BE-AMZ-V34489-Ford Motors (58837)-Web-Standard-New product range (Demo name) (12345679)". But ID is always last, in parenthesis, and after a space. Your last regex seems to work fine. Thanks. Your solution to add the new column next to the old one also works. Thanks again! – suprimos Feb 04 '22 at 09:47
Sorry to bring this back again, but after using this for a while, an issue have arised. In some rare instances, the ID is not present in the string, for instance no name has been given to the "team", so the name is simply the string "NoName" (with no ID) in the dataframe. With this setup both Team and Team ID columns return blank. Any way to keep Team column as "NoName"? (Or set both Team and Team ID to "NoName") – suprimos Mar 01 '22 at 21:16
Maybe open a new question with a reproducible example? – mozway Mar 01 '22 at 21:20

Python Pandas: Split column and add new column next to current

1 Answers1