I have the following pandas dataframe, only showing one column
0 Atlantic Division
1 Tampa Bay Lightning*
2 Boston Bruins*
3 Toronto Maple Leafs*
4 Florida Panthers
5 Detroit Red Wings
6 Montreal Canadiens
7 Ottawa Senators
8 Buffalo Sabres
9 Metropolitan Division
10 Washington Capitals*
11 Pittsburgh Penguins*
12 Philadelphia Flyers*
13 Columbus Blue Jackets*
14 New Jersey Devils*
15 Carolina Hurricanes
16 New York Islanders
17 New York Rangers
18 Central Division
19 Nashville Predators*
20 Winnipeg Jets*
21 Minnesota Wild*
22 Colorado Avalanche*
23 St. Louis Blues
24 Dallas Stars
25 Chicago Blackhawks
26 Pacific Division
27 Vegas Golden Knights*
28 Anaheim Ducks*
29 San Jose Sharks*
30 Los Angeles Kings*
31 Calgary Flames
32 Edmonton Oilers
33 Vancouver Canucks
34 Arizona Coyotes
35 Atlantic Division
36 Montreal Canadiens*
37 Ottawa Senators*
38 Boston Bruins*
39 Toronto Maple Leafs*
40 Tampa Bay Lightning
41 Florida Panthers
42 Detroit Red Wings
43 Buffalo Sabres
44 Metropolitan Division
45 Washington Capitals*
46 Pittsburgh Penguins*
47 Columbus Blue Jackets*
48 New York Rangers*
49 New York Islanders
50 Philadelphia Flyers
51 Carolina Hurricanes
52 New Jersey Devils
53 Central Division
54 Chicago Blackhawks*
55 Minnesota Wild*
56 St. Louis Blues*
57 Nashville Predators*
58 Winnipeg Jets
59 Dallas Stars
60 Colorado Avalanche
61 Pacific Division
62 Anaheim Ducks*
63 Edmonton Oilers*
64 San Jose Sharks*
65 Calgary Flames*
66 Los Angeles Kings
67 Arizona Coyotes
68 Vancouver Canucks
69 Atlantic Division
70 Florida Panthers*
71 Tampa Bay Lightning*
72 Detroit Red Wings*
73 Boston Bruins
74 Ottawa Senators
75 Montreal Canadiens
76 Buffalo Sabres
77 Toronto Maple Leafs
78 Metropolitan Division
79 Washington Capitals*
80 Pittsburgh Penguins*
81 New York Rangers*
82 New York Islanders*
83 Philadelphia Flyers*
84 Carolina Hurricanes
85 New Jersey Devils
86 Columbus Blue Jackets
87 Central Division
88 Dallas Stars*
89 St. Louis Blues*
90 Chicago Blackhawks*
91 Nashville Predators*
92 Minnesota Wild*
93 Colorado Avalanche
94 Winnipeg Jets
95 Pacific Division
96 Anaheim Ducks*
97 Los Angeles Kings*
98 San Jose Sharks*
99 Arizona Coyotes
100 Calgary Flames
101 Vancouver Canucks
102 Edmonton Oilers
103 Atlantic Division
104 Montreal Canadiens*
105 Tampa Bay Lightning*
106 Detroit Red Wings*
107 Ottawa Senators*
108 Boston Bruins
109 Florida Panthers
110 Toronto Maple Leafs
111 Buffalo Sabres
112 Metropolitan Division
113 New York Rangers*
114 Washington Capitals*
115 New York Islanders*
116 Pittsburgh Penguins*
117 Columbus Blue Jackets
118 Philadelphia Flyers
119 New Jersey Devils
120 Carolina Hurricanes
121 Central Division
122 St. Louis Blues*
123 Nashville Predators*
124 Chicago Blackhawks*
125 Minnesota Wild*
126 Winnipeg Jets*
127 Dallas Stars
128 Colorado Avalanche
129 Pacific Division
130 Anaheim Ducks*
131 Vancouver Canucks*
132 Calgary Flames*
133 Los Angeles Kings
134 San Jose Sharks
135 Edmonton Oilers
136 Arizona Coyotes
137 Atlantic Division
138 Boston Bruins*
139 Tampa Bay Lightning*
140 Montreal Canadiens*
141 Detroit Red Wings*
142 Ottawa Senators
143 Toronto Maple Leafs
144 Florida Panthers
145 Buffalo Sabres
146 Metropolitan Division
147 Pittsburgh Penguins*
148 New York Rangers*
149 Philadelphia Flyers*
150 Columbus Blue Jackets*
151 Washington Capitals
152 New Jersey Devils
153 Carolina Hurricanes
154 New York Islanders
155 Central Division
156 Colorado Avalanche*
157 St. Louis Blues*
158 Chicago Blackhawks*
159 Minnesota Wild*
160 Dallas Stars*
161 Nashville Predators
162 Winnipeg Jets
163 Pacific Division
164 Anaheim Ducks*
165 San Jose Sharks*
166 Los Angeles Kings*
167 Phoenix Coyotes
168 Vancouver Canucks
169 Calgary Flames
170 Edmonton Oilers
Name: team, dtype: object
I need to create one additional column with the city name.
At first look the regex would be simple (the first word) should be the city name, and the rest is the team name.
However some cities have 2 words (Los Angeles, St Louis ,etc)
Is there a possibility to do this with regex or it has to be done manually?
Update: I tried the following:
nhl_df['city']=nhl_df['team'].str.extract(r'^(?:([\w.]{1,5}\s\w+)|(\w+)|)(?:\s\w+)+\*?$')
But I get this error:
ValueError: Wrong number of items passed 2, placement implies 1