-2

I have a pandas DataFrame with the beginnings of postal codes distinguished by regions in following form:

region A 385
region B 656 - 659

I need to unwrap the data with dash, so it will be:

region B 656, 657, 658, 659

My code

postcodes.iloc[:,1] = postcodes.iloc[:,1].apply(lambda x: x.split('—'))
def unwrap_codes(row):
row = row['Postcode begins with']
if len(row) > 1:
    for x, y in row:
        while x != y:
            row.append(x=+1)
postcodes['Unwraped'] = postcodes.apply(unwrap_codes, axis=1)

returns a ValueError: ('too many values to unpack (expected 2)' Could you please help me to handle the error?

cs95
  • 379,657
  • 97
  • 704
  • 746
  • That indentation can't be correct .. and the error tells you that there are more than two elements in each entry in row. – MatsLindh Sep 19 '17 at 19:33
  • Lists shouldn't be modified when you iterate through them. You should use another object like a stack. When you started to iterate the list maybe it had 2 elements but you are adding more elements as time goes by and that's the problem. Possible duplicate https://stackoverflow.com/questions/6294983/modifying-list-inside-foreach-loop – Daniel Botero Correa Sep 19 '17 at 20:05

2 Answers2

1

An str.split followed by an apply seems to do it:

print(df)
     region   postcode
0  region A        385
1  region B  656 - 659

df['Unwrapped'] = df.postcode.str.split('\s*-\s*')\
             .apply(lambda x: range(int(x[0]), int(x[-1]) + 1))
print(df['Unwrapped'])
0                   (385)
1    (656, 657, 658, 659)
Name: Unwrapped, dtype: object
cs95
  • 379,657
  • 97
  • 704
  • 746
1

@cᴏʟᴅsᴘᴇᴇᴅ's answer is great. I was just bored and wanted to write something.

idx = pd.MultiIndex.from_product([df.index, [0, 1]], names=[None, 'match'])
d = df.postcode.str.extractall('(\d+)').reindex(idx).ffill().astype(int)[0]

d.unstack().add([0, 1]).apply(lambda x: list(range(*x)), 1)

0                   [385]
1    [656, 657, 658, 659]
dtype: object
piRSquared
  • 285,575
  • 57
  • 475
  • 624