Pandas Dataframe, How can I split a column into two by "," when some rows may have more than 1 ","

Question

I have a dataframe. One of the columns is a combination of CITY and STATE. I want to split this column to two columns, CITY and STATE using:

df['CITY'],df['STATE'] = df['WORKSITE'].str.split(",")

And I got this error:

ValueError Traceback (most recent call last) in () ----> 1 df['CITY'],df['STATE'] = df['WORKSITE'].str.split(",")

ValueError: too many values to unpack (expected 2)

So, I'm wondering is there a method that I can ignore the exceptions or detect which row is not working?

I'm not strongly familiar with Python, but I expect (based on my Perl sensibilities) `split` isn't really going to be the solution you need, because it will generate an array of varying length on the RHS, while your LHS is always an array of length 2. — Brian A. Henning, Apr 21 '17 at 17:38
You may be better-served to find the position of the first comma, then use substring methods to get the prefix and postfix. — Brian A. Henning, Apr 21 '17 at 17:39
[`.str.split()`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.split.html) takes a parameter `n` which limits the number of splits, so `df['WORKSITE'].str.split(",", n=1)` should do what you want. — Craig, Apr 21 '17 at 17:45

score 3 · Answer 1 · edited May 23 '17 at 12:02

3

Set n=2 in the split call and use the str method effectively:

import pandas as pd

x = ['New York, NY', 'Los Angeles, CA', 'Kansas City, KS, US']

df = pd.DataFrame(x, columns=['WORKSITE'])

df['CITY'], df['STATE'] = df['WORKSITE'].str.split(',', 2).str[0:2].str

print df

Output

              WORKSITE         CITY STATE
0         New York, NY     New York    NY
1      Los Angeles, CA  Los Angeles    CA
2  Kansas City, KS, US  Kansas City    KS

I got some help from looking at this answer to this question.

edited May 23 '17 at 12:02

Community

1
1

answered Apr 21 '17 at 20:12

This is great. I really appreciate your help! – Kyle Chen Apr 22 '17 at 02:55
Happy that my answer inspired this one. I'd just add that, in this case, the `n=2` to the split is not strictly necessary (although it's good for performance in case there are pathological cases with lots of excessive commas). On the other hand, neither `n=2` nor the slice notation protects against a `TypeError` if there are **less** commas than necessary. In that case the developer would have to first split, then use the `.str.len()` method to create a column which would be used to filter rows with too few splits. – LeoRochael Jan 11 '18 at 20:41

Pandas Dataframe, How can I split a column into two by "," when some rows may have more than 1 ","

1 Answers1