My sample data is as follows:
sample_json = """{
"P1":[
{"Question":"Fruit",
"Choices":["Yes","No"]}
],
"P2":[
{"Question":"Fruit Name",
"Choices":["Mango","Apple","Banana"]}
],
"P3":[
{"Question":"Fruit color",
"Choices":["Yellow","Red"]}
],
"P4":[
{"Question":"Vegetable",
"Choices":["Yes","No"]}
],
"P5":[
{"Question":"Veggie Name",
"Choices":["Tomato","Potato","Carrots"]}
],
"P6":[
{"Question":"Veggie Color",
"Choices":["Red","Yellow","Brown"]}
],
"P7":[
{"Question":"Enjoy Eating?",
"Choices":["Yes","No"]}
]
}"""
I am trying to generate a data frame using pandas as follows:
import json, random
import pandas as pd
sample_data = json.loads(sample_json)
colHeaders = []
for k,v in sample_data.items():
colHeaders.append(v[0]['Question'])
df = pd.DataFrame(columns= colHeaders)
for i in range (10):
Answers = []
for k,v in sample_data.items():
Answers.append(random.choice(v[0]['Choices']))
df.loc[len(df)] = Answers
It creates the df like below
Although random, I want to populate it based on P1 and P4 on the following conditions:
(P.S: P1, P2, P3....P7 in sample_json)
- If
P1.AnswerChoice = No
, fillNull
to P2 and P3 - If
P4.AnswerChoice = No
, fillNull
to P5 and P6 - If
P1.AnswerChoice = No
andP4.AnswerChoice = No
, fillNull
to p7 - Both
P1.AnswerChoice
andP4.AnswerChoice
cannot beYes
So that it can produce the following data frame:
Fruit | Fruit Name | Fruit Color | Vegetable | Veggie Name | Veggie Color | Enjoy eating? |
---|---|---|---|---|---|---|
No | Null | Null | Yes | Carrots | Yellow | No |
No | Null | Null | No | Null | Null | Null |
Yes | Apple | Yellow | No | Null | Null | Yes |
Yes | Banana | Yellow | No | Null | Null | No |
No | Null | Null | Yes | Potato | Yellow | No |
No | Null | Null | Yes | Tomato | Yellow | Yes |
Yes | Mango | Red | No | Null | Null | Null |
No | Null | Null | Yes | Carrots | Yellow | No |
Yes | Apple | Yellow | No | Null | Null | No |
Edit:
I would want to handle this with the for loop that iterates over the json to prepare the row for the data frame instead of editing the data frame.
For example in the following part of the code if it is possible:
for i in range (10):
Answers = []
for k,v in sample_data.items():
Answers.append(random.choice(v[0]['Choices']))
df.loc[len(df)] = Answers