Unwrap hyphen separated numbers in a string into a range in pandas

Question

I have a pandas DataFrame with the beginnings of postal codes distinguished by regions in following form:

region A 385
region B 656 - 659

I need to unwrap the data with dash, so it will be:

region B 656, 657, 658, 659

My code

postcodes.iloc[:,1] = postcodes.iloc[:,1].apply(lambda x: x.split('—'))
def unwrap_codes(row):
row = row['Postcode begins with']
if len(row) > 1:
    for x, y in row:
        while x != y:
            row.append(x=+1)
postcodes['Unwraped'] = postcodes.apply(unwrap_codes, axis=1)

returns a ValueError: ('too many values to unpack (expected 2)' Could you please help me to handle the error?

That indentation can't be correct .. and the error tells you that there are more than two elements in each entry in row. — MatsLindh, Sep 19 '17 at 19:33
Lists shouldn't be modified when you iterate through them. You should use another object like a stack. When you started to iterate the list maybe it had 2 elements but you are adding more elements as time goes by and that's the problem. Possible duplicate https://stackoverflow.com/questions/6294983/modifying-list-inside-foreach-loop — Daniel Botero Correa, Sep 19 '17 at 20:05

cs95 · Accepted Answer · 2017-09-19T19:44:17.977

1

An str.split followed by an apply seems to do it:

print(df)
     region   postcode
0  region A        385
1  region B  656 - 659

df['Unwrapped'] = df.postcode.str.split('\s*-\s*')\
             .apply(lambda x: range(int(x[0]), int(x[-1]) + 1))
print(df['Unwrapped'])
0                   (385)
1    (656, 657, 658, 659)
Name: Unwrapped, dtype: object

edited Sep 19 '17 at 19:44

answered Sep 19 '17 at 19:35

cs95

379,657
97
704
746

score 1 · Answer 2 · answered Sep 19 '17 at 20:40

1

@cᴏʟᴅsᴘᴇᴇᴅ's answer is great. I was just bored and wanted to write something.

idx = pd.MultiIndex.from_product([df.index, [0, 1]], names=[None, 'match'])
d = df.postcode.str.extractall('(\d+)').reindex(idx).ffill().astype(int)[0]

d.unstack().add([0, 1]).apply(lambda x: list(range(*x)), 1)

0                   [385]
1    [656, 657, 658, 659]
dtype: object

answered Sep 19 '17 at 20:40

piRSquared

285,575
57
475
624

Excellence born from boredom! – cs95 Sep 19 '17 at 20:44

Unwrap hyphen separated numbers in a string into a range in pandas

2 Answers2