4

My problem is kind of simple, but I'm not sure there's a way to do what I'm looking for:

I had to store in a SQL database some data, that includes some intervals that will later be used. Because of this, I had to store it as a string, like this:

 variable     interval
    A          (-0.001, 2.0]
    A          (2.0, 6.0]

So, then, I want to use said intervals to cut another variable, like this:

df1 =  pd.DataFrame({'interval': {4: '(-0.001, 2.0]',
  5: '(2.0, 6.0]'},
 'variable': {4: 'A',
  5: 'A',
}})
df2 =  pd.DataFrame({'A': [1,1,3]})
bins = df1[df1.variable.eq('A')].interval
new_series = pd.cut(df2['A'], bins=bins)

But this brings:

 ValueError: could not convert string to float: '(-0.001, 2.0]'
 

Tried:

bins = bins.astype('interval')

But this brings:

TypeError: type <class 'str'> with value (-0.001, 2.0] is not an interval

Is there something I can do? Thanks

Juan C
  • 5,846
  • 2
  • 17
  • 51

1 Answers1

8

IIUC, you could parse the string by hand, then convert bins to IntervalIndex:

import ast

import pandas as pd


def interval_type(s):
    """Parse interval string to Interval"""
    
    table = str.maketrans({'[': '(', ']': ')'})
    left_closed = s.startswith('[')
    right_closed = s.endswith(']')

    left, right = ast.literal_eval(s.translate(table))

    t = 'neither'
    if left_closed and right_closed:
        t = 'both'
    elif left_closed:
        t = 'left'
    elif right_closed:
        t = 'right'

    return pd.Interval(left, right, closed=t)


df1 = pd.DataFrame({'interval': {4: '(-0.001, 2.0]', 5: '(2.0, 6.0]'},
                    'variable': {4: 'A', 5: 'A'}})
df1['interval'] = df1['interval'].apply(interval_type)

df2 = pd.DataFrame({'A': [1, 1, 3]})
bins = df1[df1.variable.eq('A')].interval
new_series = pd.cut(df2['A'], bins=pd.IntervalIndex(bins))

print(new_series)

Output

0    (-0.001, 2.0]
1    (-0.001, 2.0]
2       (2.0, 6.0]
Name: A, dtype: category
Categories (2, interval[float64]): [(-0.001, 2.0] < (2.0, 6.0]]
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • 1
    Thanks a lot! thought there was a more straight forward way to parse and now I feel bad for making your write this for me. – Juan C Dec 14 '20 at 20:48
  • 2
    No problem, for every answer I write I learn a bit. – Dani Mesejo Dec 14 '20 at 20:50
  • 1
    That's two of us ! – Juan C Dec 14 '20 at 20:56
  • 1
    That worked almost 100% for me, but I have some "inf" intervals, which raise a ValueError from ast. Does anyone knows how to fix it? `ValueError: malformed node or string: <_ast.Name object at 0x7f3cbd48ae20>` Where the string was: `(1.0, inf]` – Rômulo Férrer Filho Sep 22 '22 at 21:30
  • @RômuloFérrerFilho and others, a way around is to retrieve the interval boundaries by hand, e.g. using: `left, right = map(float, s.translate(table).replace('(', '').replace(')', '').split(', '))` instead of the `ast.literal_eval` line. – Stefan Oct 25 '22 at 10:22