0

My sample data is as follows:

sample_json = """{
    "P1":[
        {"Question":"Fruit",
        "Choices":["Yes","No"]}
    ],
    "P2":[
        {"Question":"Fruit Name",
        "Choices":["Mango","Apple","Banana"]}
    ],
        "P3":[
        {"Question":"Fruit color",
        "Choices":["Yellow","Red"]}
    ],
    
        "P4":[
        {"Question":"Vegetable",
        "Choices":["Yes","No"]}
    ],
    "P5":[
        {"Question":"Veggie Name",
        "Choices":["Tomato","Potato","Carrots"]}
    ],
        "P6":[
        {"Question":"Veggie Color",
        "Choices":["Red","Yellow","Brown"]}
    ],
        "P7":[
        {"Question":"Enjoy Eating?",
        "Choices":["Yes","No"]}
    ]
}"""

I am trying to generate a data frame using pandas as follows:

import json, random
import pandas as pd

sample_data = json.loads(sample_json)

colHeaders = []
for k,v in sample_data.items():
    colHeaders.append(v[0]['Question'])

df = pd.DataFrame(columns= colHeaders)

for i in range (10):
    Answers = []
    for k,v in sample_data.items():
        Answers.append(random.choice(v[0]['Choices']))
    df.loc[len(df)] = Answers

It creates the df like below

enter image description here

Although random, I want to populate it based on P1 and P4 on the following conditions:

(P.S: P1, P2, P3....P7 in sample_json)

  • If P1.AnswerChoice = No, fill Null to P2 and P3
  • If P4.AnswerChoice = No, fill Null to P5 and P6
  • If P1.AnswerChoice = No and P4.AnswerChoice = No, fill Null to p7
  • Both P1.AnswerChoice and P4.AnswerChoice cannot be Yes

So that it can produce the following data frame:

Fruit Fruit Name Fruit Color Vegetable Veggie Name Veggie Color Enjoy eating?
No Null Null Yes Carrots Yellow No
No Null Null No Null Null Null
Yes Apple Yellow No Null Null Yes
Yes Banana Yellow No Null Null No
No Null Null Yes Potato Yellow No
No Null Null Yes Tomato Yellow Yes
Yes Mango Red No Null Null Null
No Null Null Yes Carrots Yellow No
Yes Apple Yellow No Null Null No

Edit:

I would want to handle this with the for loop that iterates over the json to prepare the row for the data frame instead of editing the data frame.

For example in the following part of the code if it is possible:

for i in range (10):
        Answers = []
        for k,v in sample_data.items():
            Answers.append(random.choice(v[0]['Choices']))
        df.loc[len(df)] = Answers
kingmakerking
  • 2,017
  • 2
  • 28
  • 44

1 Answers1

-1

You should not have two columns with the same names in one pd.DataFrame. I will just use P1, P2, ... to refer to the columns. You need logical indexing in pandas:

# If P1.AnswerChoice = No, fill Null to P2 and P3
df.loc[df['P1']=='No', ['P2', 'P3']] = 'Null'

# P1.AnswerChoice = No and P4.AnswerChoice = No, fill Null to p7
df.loc[(df['P1']=='No')&(df['P4']=='No'), 'P7'] = 'Null'

# Both P1.AnswerChoice and P4.AnswerChoice cannot be Yes
df = df[(df['P1']!='Yes')&(df['P4']!='Yes')]

Z Li
  • 4,133
  • 1
  • 4
  • 19
  • Thanks. But, can't the logic be added already while adding it to the data frame? I prefer not to edit the data frame – kingmakerking Jan 11 '21 at 22:05
  • you cannot do that in the for loop if you do it the way in your code I think. You do not have efficient access to previous elements. – Z Li Jan 11 '21 at 23:30