Generate pandas dataframe based on conditions

Question

My sample data is as follows:

sample_json = """{
    "P1":[
        {"Question":"Fruit",
        "Choices":["Yes","No"]}
    ],
    "P2":[
        {"Question":"Fruit Name",
        "Choices":["Mango","Apple","Banana"]}
    ],
        "P3":[
        {"Question":"Fruit color",
        "Choices":["Yellow","Red"]}
    ],
    
        "P4":[
        {"Question":"Vegetable",
        "Choices":["Yes","No"]}
    ],
    "P5":[
        {"Question":"Veggie Name",
        "Choices":["Tomato","Potato","Carrots"]}
    ],
        "P6":[
        {"Question":"Veggie Color",
        "Choices":["Red","Yellow","Brown"]}
    ],
        "P7":[
        {"Question":"Enjoy Eating?",
        "Choices":["Yes","No"]}
    ]
}"""

I am trying to generate a data frame using pandas as follows:

import json, random
import pandas as pd

sample_data = json.loads(sample_json)

colHeaders = []
for k,v in sample_data.items():
    colHeaders.append(v[0]['Question'])

df = pd.DataFrame(columns= colHeaders)

for i in range (10):
    Answers = []
    for k,v in sample_data.items():
        Answers.append(random.choice(v[0]['Choices']))
    df.loc[len(df)] = Answers

It creates the df like below

Although random, I want to populate it based on P1 and P4 on the following conditions:

(P.S: P1, P2, P3....P7 in sample_json)

If P1.AnswerChoice = No, fill Null to P2 and P3
If P4.AnswerChoice = No, fill Null to P5 and P6
If P1.AnswerChoice = No and P4.AnswerChoice = No, fill Null to p7
Both P1.AnswerChoice and P4.AnswerChoice cannot be Yes

So that it can produce the following data frame:

Fruit	Fruit Name	Fruit Color	Vegetable	Veggie Name	Veggie Color	Enjoy eating?
No	Null	Null	Yes	Carrots	Yellow	No
No	Null	Null	No	Null	Null	Null
Yes	Apple	Yellow	No	Null	Null	Yes
Yes	Banana	Yellow	No	Null	Null	No
No	Null	Null	Yes	Potato	Yellow	No
No	Null	Null	Yes	Tomato	Yellow	Yes
Yes	Mango	Red	No	Null	Null	Null
No	Null	Null	Yes	Carrots	Yellow	No
Yes	Apple	Yellow	No	Null	Null	No

Edit:

I would want to handle this with the for loop that iterates over the json to prepare the row for the data frame instead of editing the data frame.

For example in the following part of the code if it is possible:

for i in range (10):
        Answers = []
        for k,v in sample_data.items():
            Answers.append(random.choice(v[0]['Choices']))
        df.loc[len(df)] = Answers

@It_is_Chris I clarified now with the edit. Neither data frame nor json, but preferably the logic which iterates through the json to create the row added to df. — kingmakerking, Jan 11 '21 at 22:13
the way you create the dataframe is inefficient... look at : https://stackoverflow.com/a/62734983/8893827 — adir abargil, Jan 11 '21 at 22:14

score -1 · Answer 1 · answered Jan 11 '21 at 21:54

-1

You should not have two columns with the same names in one pd.DataFrame. I will just use P1, P2, ... to refer to the columns. You need logical indexing in pandas:

# If P1.AnswerChoice = No, fill Null to P2 and P3
df.loc[df['P1']=='No', ['P2', 'P3']] = 'Null'

# P1.AnswerChoice = No and P4.AnswerChoice = No, fill Null to p7
df.loc[(df['P1']=='No')&(df['P4']=='No'), 'P7'] = 'Null'

# Both P1.AnswerChoice and P4.AnswerChoice cannot be Yes
df = df[(df['P1']!='Yes')&(df['P4']!='Yes')]

answered Jan 11 '21 at 21:54

Z Li

4,133
1
4
19

Thanks. But, can't the logic be added already while adding it to the data frame? I prefer not to edit the data frame – kingmakerking Jan 11 '21 at 22:05
you cannot do that in the for loop if you do it the way in your code I think. You do not have efficient access to previous elements. – Z Li Jan 11 '21 at 23:30

Generate pandas dataframe based on conditions

1 Answers1