1

I am trying to split a column containing City, State, and Zip into three columns. The data in the column is in this format: 'City, State Zip' - comma separating the city from state, and a space separating state from zip code. I can split out the city using:

df['Owner City State Zip'].str.split(',').apply(lambda x: x[0]

But for some reason when I try the following to split out the state and zip:

df['Owner City State Zip'].str.split(',').apply(lambda x: x[1]

I get the error - Index is out of range

Any help would be appreciated! This seems trivial but has been more difficult than I was expecting.

Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
Jeff Sword
  • 37
  • 5
  • looks like the problem is in `x[1]`. can you share more of your code, pretty difficult to identify the problem from just one incomplete statement. – Wasi Ahmad Dec 23 '16 at 00:33
  • 1
    Does every row have a city and state separated by a comma? – 3novak Dec 23 '16 at 00:33
  • Yes every row has the city and state separated by a comma. A few examples are: "Los Angeles, CA 90015" "Torrance, CA 90504", "Manhattan Beach, CA 90266". – Jeff Sword Dec 23 '16 at 00:40
  • Try to isolate the issue to a few rows and give them to us so we can replicate the issue! If that fails, at least give us a few example rows. Your problem is likely that some rows have only the city. Check by doing this: `df[len_str] = df['owner city state zip'].str.split(',').apply(len)` and then `df[df.len_str < 2]` – cd98 Dec 23 '16 at 00:41
  • I mean, did you search every single row for a comma? I realize that you think there is a comma in every row, but that doesn't mean that there actually is. `df['num_commas'] = df['Owner City State Zip'].str.count(',')`. Then ensure that the column values are strictly greater than 0 as a first check. – 3novak Dec 23 '16 at 00:43
  • I am sorry, I am not sure what other code to share. My data frame is just one column containing city, state and zip code with a comma separating city from state, and a space separating state from zip code. – Jeff Sword Dec 23 '16 at 00:45
  • [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – cd98 Dec 23 '16 at 00:46
  • Thank you for that. I'll use that format in the future. I am very new to pandas, python, and stack social, still getting the hang of how to best reach out for help. – Jeff Sword Dec 23 '16 at 00:55

1 Answers1

6

Consider the df

df = pd.DataFrame({'Owner City State Zip': ["Los Angeles, CA 90015"]})

print(df)

    Owner City State Zip
0  Los Angeles, CA 90015

I'd use this handy bit of regex and pandas str string accessor

regex = r'(?P<City>[^,]+)\s*,\s*(?P<State>[^\s]+)\s+(?P<Zip>\S+)'
df['Owner City State Zip'].str.extract(regex)

          City State    Zip
0  Los Angeles    CA  90015
piRSquared
  • 285,575
  • 57
  • 475
  • 624