I have a dataframe where parameters
column is JSON and contains multiple actual rows and columns:
input_data = pandas.DataFrame({'id':['0001','0002','0003'],
'parameters':["{'product':['book','cat','fish'],'person':['me','you']}",
"'{'product':['book','cat'],'person':['me','you','us']}'",
"'{'product':['apple','snake','rabbit','octopus'],'person':['them','you','us','we','they']}'"]})
... from which I'd like to extract the following data frames:
product_data = pandas.DataFrame({'id':['0001','0001','0001','0002','0002','0003','0003','0003','0003'],
'product':['book','cat','fish','book','cat','apple','snake','rabbit','octopus']})
person_data = pandas.DataFrame({'id':['0001','0001','0002','0002','0002','0003','0003','0003','0003','0003'],
'person':['me','you','me','you','us','them','you','us','we','they']})
Below is how I've utilized Regular Expressions to get me there. I doubt this is the best way to do it but here it goes:
for i in input_data.id.tolist():
s = ''.join(input_data[input_data.id == i]['parameters'])
product_string = re.search(r"product':(.*?),'person", str(s)).group(1)
product_data = pandas.DataFrame(product_string[1:-1].split(','))
person_string = re.search(r"person':(.*?)}", str(s)).group(1)
person_data = pandas.DataFrame(person_string[1:-1].split(','))
print("........")
print(product_data)
print("........")
print(person_data)
I'd like to learn a faster, more elegant, or wholesome solution that may capture unexpected nuances.