1

I have a excel sheet similar to this, although with a lot more columns:

Team Members
Team1 (553) 95435
Team2 (443) 872

I want to split the team column into Team and a new column, named Team ID. I currently do this with the following code:

df[['Team', 'Team ID']] = df['Team'].str.split(r"\s\(+(?=\S*$)", expand=True)
df['Team ID'] = df['Team ID'].str[:-1]

This works fine (note that Team name can include numbers, spaces and paranthesis). So while this might not be perfect, I gets the job done.

My issue is that the new column, "Team ID" is placed at the end of the dataset. So it would be "Team - Members - Team ID". While not an issue with 3 columns, sometimes there is 10 columns where 7 needs to be split.

So the question: Is there any way to split a column in 2, and place the newly created column next to the old one?

mozway
  • 194,879
  • 13
  • 39
  • 75
suprimos
  • 33
  • 6

1 Answers1

2

You can use str.extract with a regex.

To insert on the correct position you could use insert:

out = df['Team'].str.extract('(\w+) \((\d+)\)')

df['Team'] = out[0]
df.insert(df.columns.get_loc('Team')+1, 'Team ID', out[1])

output:

    Team  Team ID  Members
0  Team1      553    95435
1  Team2      443      872

regex:

(\w+)      # match word
\((\d+)\)  # match digits surrounded by parentheses
mozway
  • 194,879
  • 13
  • 39
  • 75
  • How would that regex work if the Team name is "Team1 (Boston)"? Or even "Team1 (99)" followed by ID? – suprimos Feb 04 '22 at 09:27
  • can you provide exhaustive example of full names? If you really want to match anything initially and know that the ID is the last thing, use an anchor `'(.*) \((\d+)\)$'` – mozway Feb 04 '22 at 09:31
  • Team might have been over simplified. An example of a string could be "BE-AMZ-V34489-Ford Motors (58837)-Web-Standard-New product range (Demo name) (12345679)". But ID is always last, in parenthesis, and after a space. Your last regex seems to work fine. Thanks. Your solution to add the new column next to the old one also works. Thanks again! – suprimos Feb 04 '22 at 09:47
  • Sorry to bring this back again, but after using this for a while, an issue have arised. In some rare instances, the ID is not present in the string, for instance no name has been given to the "team", so the name is simply the string "NoName" (with no ID) in the dataframe. With this setup both Team and Team ID columns return blank. Any way to keep Team column as "NoName"? (Or set both Team and Team ID to "NoName") – suprimos Mar 01 '22 at 21:16
  • Maybe open a new question with a reproducible example? – mozway Mar 01 '22 at 21:20