2

I have a Pandas DataFrame that has a column containing comma separated numbers, ampersand separated numbers, and hyphen separated number ranges...

Title   LLFCs     Red     Amber   Green
a       15, 18    11.65   2.86    1.89
b       16 & 19   9.08    2.93    1.53
c       112-114   6.45    2.54    1.64

I would like each 'LLFC' value to have its own row, meaning the numbers implied by the hyphen (113 in this case) must also be unwrapped. My ideal outcome is the following...

Title   LLFCs     Red     Amber   Green
a       15        11.65   2.86    1.89
a       18        11.65   2.86    1.89
b       16        9.08    2.93    1.53
b       19        9.08    2.93    1.53
c       112       6.45    2.54    1.64
c       113       6.45    2.54    1.64
c       114       6.45    2.54    1.64

I currently have the following few lines that do everything I require apart from unwrapping the hyphen values...

data1 = data1.assign(LLFCs=data1['LLFCs'].str.replace('-',', '))
data1 = data1.assign(LLFCs=data1['LLFCs'].str.replace(' & ',', '))
data1 = data1.assign(LLFCs=data1['LLFCs'].str.split(', ')).explode('LLFCs')

This code achieves the following...

Title   LLFCs     Red     Amber   Green
a       15        11.65   2.86    1.89
a       18        11.65   2.86    1.89
b       16        9.08    2.93    1.53
b       19        9.08    2.93    1.53
c       112       6.45    2.54    1.64
c       114       6.45    2.54    1.64

Which obviously doesn't include the hyphen wrapped value, would somebody be able to help me with this?

Gino Mempin
  • 25,369
  • 29
  • 96
  • 135

2 Answers2

1

Inspiration from here numeric string to range

import re
data = '''Title   LLFCs     Red     Amber   Green
a       15, 18    11.65   2.86    1.89
b       16 & 19    9.08    2.93    1.53
c       112-114   6.45    2.54    1.64'''
arr = [[t for t in re.split(r"[ ][ ]+", l)] for l in data.split("\n")]
df = pd.DataFrame(arr[1:], columns=arr[0])

def f(x):
    x = re.sub(" ","", x)
    result = []
    for part in x.split(','):
        if "-" in part:
            a, b = part.split("-")
            a, b = int(a), int(b)
            result.extend(range(a, b + 1))
        elif "&" in part:
            a, b = part.split("&")
            result += [int(a), int(b)]
        else:
            a = int(part)
            result.append(a)
    return result

df = df.assign(LLFCs=lambda x: [f(curr) for curr in x["LLFCs"]]).explode("LLFCs")
print(df.to_string(index=False))

output

Title LLFCs    Red Amber Green
    a    15  11.65  2.86  1.89
    a    18  11.65  2.86  1.89
    b    16   9.08  2.93  1.53
    b    19   9.08  2.93  1.53
    c   112   6.45  2.54  1.64
    c   113   6.45  2.54  1.64
    c   114   6.45  2.54  1.64
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
0

Split through column LLFCs, and iterate through the result - if the delimiter is -, create a range of the numbers. After that, you can explode:

df['LLFCs'] = [tuple(range(int(first),int(last)+1))
               if delimiter == "-" else (int(first), int(last)) 
               for a, delimiter, b 
               #note that the delimiter is wrapped in parentheses,
               #this keeps the delimiter as part of the extract
               in df.LLFCs.str.split("([,&-])")
              ]
df


Title   LLFCs          Red     Amber    Green
0   a   (15, 18)       11.65    2.86    1.89
1   b   (16, 19)        9.08    2.93    1.53
2   c   (112, 113, 114) 6.45    2.54    1.64

Now, you can explode :

df.explode("LLFCs")

Title   LLFCs   Red Amber   Green
0   a   15  11.65   2.86    1.89
0   a   18  11.65   2.86    1.89
1   b   16  9.08    2.93    1.53
1   b   19  9.08    2.93    1.53
2   c   112 6.45    2.54    1.64
2   c   113 6.45    2.54    1.64
2   c   114 6.45    2.54    1.64
sammywemmy
  • 27,093
  • 4
  • 17
  • 31