How do you unwrap hyphen-separated number ranges in a Pandas DataFrame column?

Question

I have a Pandas DataFrame that has a column containing comma separated numbers, ampersand separated numbers, and hyphen separated number ranges...

Title   LLFCs     Red     Amber   Green
a       15, 18    11.65   2.86    1.89
b       16 & 19   9.08    2.93    1.53
c       112-114   6.45    2.54    1.64

I would like each 'LLFC' value to have its own row, meaning the numbers implied by the hyphen (113 in this case) must also be unwrapped. My ideal outcome is the following...

Title   LLFCs     Red     Amber   Green
a       15        11.65   2.86    1.89
a       18        11.65   2.86    1.89
b       16        9.08    2.93    1.53
b       19        9.08    2.93    1.53
c       112       6.45    2.54    1.64
c       113       6.45    2.54    1.64
c       114       6.45    2.54    1.64

I currently have the following few lines that do everything I require apart from unwrapping the hyphen values...

data1 = data1.assign(LLFCs=data1['LLFCs'].str.replace('-',', '))
data1 = data1.assign(LLFCs=data1['LLFCs'].str.replace(' & ',', '))
data1 = data1.assign(LLFCs=data1['LLFCs'].str.split(', ')).explode('LLFCs')

This code achieves the following...

Title   LLFCs     Red     Amber   Green
a       15        11.65   2.86    1.89
a       18        11.65   2.86    1.89
b       16        9.08    2.93    1.53
b       19        9.08    2.93    1.53
c       112       6.45    2.54    1.64
c       114       6.45    2.54    1.64

Which obviously doesn't include the hyphen wrapped value, would somebody be able to help me with this?

so in the case of the hyphen you need to expand the range ? – Umar.H Jul 29 '20 at 09:38 — Umar.H, Jul 29 '20 at 09:38

score 1 · Accepted Answer · answered Jul 29 '20 at 11:15

Inspiration from here numeric string to range

import re
data = '''Title   LLFCs     Red     Amber   Green
a       15, 18    11.65   2.86    1.89
b       16 & 19    9.08    2.93    1.53
c       112-114   6.45    2.54    1.64'''
arr = [[t for t in re.split(r"[ ][ ]+", l)] for l in data.split("\n")]
df = pd.DataFrame(arr[1:], columns=arr[0])

def f(x):
    x = re.sub(" ","", x)
    result = []
    for part in x.split(','):
        if "-" in part:
            a, b = part.split("-")
            a, b = int(a), int(b)
            result.extend(range(a, b + 1))
        elif "&" in part:
            a, b = part.split("&")
            result += [int(a), int(b)]
        else:
            a = int(part)
            result.append(a)
    return result

df = df.assign(LLFCs=lambda x: [f(curr) for curr in x["LLFCs"]]).explode("LLFCs")
print(df.to_string(index=False))

output

Title LLFCs    Red Amber Green
    a    15  11.65  2.86  1.89
    a    18  11.65  2.86  1.89
    b    16   9.08  2.93  1.53
    b    19   9.08  2.93  1.53
    c   112   6.45  2.54  1.64
    c   113   6.45  2.54  1.64
    c   114   6.45  2.54  1.64

score 0 · Answer 2 · answered Jul 29 '20 at 11:32

Split through column LLFCs, and iterate through the result - if the delimiter is -, create a range of the numbers. After that, you can explode:

df['LLFCs'] = [tuple(range(int(first),int(last)+1))
               if delimiter == "-" else (int(first), int(last)) 
               for a, delimiter, b 
               #note that the delimiter is wrapped in parentheses,
               #this keeps the delimiter as part of the extract
               in df.LLFCs.str.split("([,&-])")
              ]
df


Title   LLFCs          Red     Amber    Green
0   a   (15, 18)       11.65    2.86    1.89
1   b   (16, 19)        9.08    2.93    1.53
2   c   (112, 113, 114) 6.45    2.54    1.64

Now, you can explode :

df.explode("LLFCs")

Title   LLFCs   Red Amber   Green
0   a   15  11.65   2.86    1.89
0   a   18  11.65   2.86    1.89
1   b   16  9.08    2.93    1.53
1   b   19  9.08    2.93    1.53
2   c   112 6.45    2.54    1.64
2   c   113 6.45    2.54    1.64
2   c   114 6.45    2.54    1.64

How do you unwrap hyphen-separated number ranges in a Pandas DataFrame column?

2 Answers2

Linked