Pandas apply function throws NotImplementedError

Question

I have a pretty basic df, and I want to create 2 new columns based off of some regex of one column. I created a function to do this which returned 2 values.

def get_value(s):
    result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
    if len(result) != 2:
        return -1, -1
    else:
        matches = []
        for match in result:
            matches.append(match[0] + '.' + match[1])
        return  float(matches[0]), float(matches[1])

When I try this: data['Test1'], data['Test2'] = zip(*data['mod_data'].apply(get_value))

It throws an error saying "NotImplementedError: isna is not defined for MultiIndex", but if I split it into 2 diff functions it works.

def get_value1(s):
    result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
    if len(result) != 2:
        return -1
    else:
        matches = []
        for match in result:
            matches.append(match[0] + '.' + match[1])
        return  float(matches[0])

def get_value2(s):
    result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
    if len(result) != 2:
        return -1
    else:
        matches = []
        for match in result:
            matches.append(match[0] + '.' + match[1])
        return float(matches[1])


data['From'] = data['mod_data'].apply(get_value1)
data['To'] = data['mod_data'].apply(get_value2)

Another thing to note is that the error NotImplementedError gets thrown at the very end. I added print statement in my get_value function, and it gets thrown after it calculated the last row.

Edit: Added example df of what I am dealing with

test = pd.DataFrame([['A', 'A1', 'Top', '[{"Value":"37.29","ID":"S1234.1","Time":"","EXPTIME_Name":"","Value":"37.01"}]'], 
                     ['B', 'B1', 'Bottom', '[{"EXPO=T10;PID=.ABCDE149;"Value":"45.29";RETICLEID=S14G1490Y2;SEQ=5A423002",Value":"56.98"}]']],
                     columns=['Module', 'line', 'area', 'mod_data'])

desired result:

  Module line  ...   From     To
0      A   A1  ...  37.29  37.01
1      B   B1  ...  45.29  56.98

you can probably use the pandas method `str.extractall` with `np.where` to replace each function with one line of code. Can you kindly include sample input and expected output data? Please see: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — David Erickson, Jan 01 '21 at 21:17
Your first function is returning a tuple so pandas might be trying to create a Series of tuples instead of two Series as you want. You can also try the `result_type='expand'` arg to `DataFrame.apply`. — Kyle, Jan 01 '21 at 21:19
@DavidErickson okay I added an example df of what i'm dealing with — mike_gundy123, Jan 01 '21 at 21:56
the desired output of the function (that's failing) would be 37.29, 37.01 AND 45.29, 56.98 — mike_gundy123, Jan 01 '21 at 22:01
@mike_gundy123 please see my answer. You can adjust your regex and use `str.findall`: — David Erickson, Jan 01 '21 at 22:18

David Erickson · Accepted Answer · 2021-01-01T22:53:40.990

First, your regex was a little bit off. Change '(?<=Value":")(\d+)\.(\d+)?(?=")' to '(?<=Value":")(\d+\.\d+)?(?=")', so that the full float isin one capture group. You were separating the part before the decimal into one group and the part after into another:

Then, you can use str.findall:

test = pd.DataFrame([['A', 'A1', 'Top', '[{"Value":"37.29","ID":"S1234.1","Time":"","EXPTIME_Name":"","Value":"37.01"}]'], 
                     ['B', 'B1', 'Bottom', '[{"EXPO=T10;PID=.ABCDE149;"Value":"45.29";RETICLEID=S14G1490Y2;SEQ=5A423002",Value":"56.98"}]']],
                     columns=['Module', 'line', 'area', 'mod_data'])
test[['From', 'To']] = test['mod_data'].str.findall('(?<=Value":")(\d+\.\d+)?(?=")')
test
Out[1]: 
  Module line    area                                           mod_data  \
0      A   A1     Top  [{"Value":"37.29","ID":"S1234.1","Time":"","EX...   
1      B   B1  Bottom  [{"EXPO=T10;PID=.ABCDE149;"Value":"45.29";RETI...   

    From     To  
0  37.29  37.01  
1  45.29  56.98

Hey, thank you for you work! I'm very new to regex, so I appreciate the help on that front! Sadly the result is not exactly what I was looking for. I added a better explanation of what the result I want to be. Currently I CAN get the desired results, but I can't do it in one call, but I have to have two different functions to do it. I hope that makes sense what I am saying. — mike_gundy123, Jan 01 '21 at 22:34

Pandas apply function throws NotImplementedError

1 Answers1