I have a pretty basic df, and I want to create 2 new columns based off of some regex of one column. I created a function to do this which returned 2 values.
def get_value(s):
result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
if len(result) != 2:
return -1, -1
else:
matches = []
for match in result:
matches.append(match[0] + '.' + match[1])
return float(matches[0]), float(matches[1])
When I try this: data['Test1'], data['Test2'] = zip(*data['mod_data'].apply(get_value))
It throws an error saying "NotImplementedError: isna is not defined for MultiIndex", but if I split it into 2 diff functions it works.
def get_value1(s):
result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
if len(result) != 2:
return -1
else:
matches = []
for match in result:
matches.append(match[0] + '.' + match[1])
return float(matches[0])
def get_value2(s):
result = re.findall('(?<=Value":")(\d+)\.(\d+)?(?=")', s)
if len(result) != 2:
return -1
else:
matches = []
for match in result:
matches.append(match[0] + '.' + match[1])
return float(matches[1])
data['From'] = data['mod_data'].apply(get_value1)
data['To'] = data['mod_data'].apply(get_value2)
Another thing to note is that the error NotImplementedError gets thrown at the very end. I added print statement in my get_value function, and it gets thrown after it calculated the last row.
Edit: Added example df of what I am dealing with
test = pd.DataFrame([['A', 'A1', 'Top', '[{"Value":"37.29","ID":"S1234.1","Time":"","EXPTIME_Name":"","Value":"37.01"}]'],
['B', 'B1', 'Bottom', '[{"EXPO=T10;PID=.ABCDE149;"Value":"45.29";RETICLEID=S14G1490Y2;SEQ=5A423002",Value":"56.98"}]']],
columns=['Module', 'line', 'area', 'mod_data'])
desired result:
Module line ... From To
0 A A1 ... 37.29 37.01
1 B B1 ... 45.29 56.98