I am using a Dataset from Standford (see Dev Set 2.0). This file is in JSON format. When I read the file, it is a dictionary, but I changed it to a DF:
import json
json_file = open("dev-v2.0.json", "r")
json_data = json.load(json_file)
json_file.close()
df = pd.DataFrame.from_dict(json_data)
df = df[0:2] # for this example, only a subset
All the information that I need is in the df['data'] column. Within every row, there is so many data, in this format:
{'title': 'Normans', 'paragraphs': [{'qas': [{'question': 'In what country is Normandy located?', 'id': '56ddde6b9a695914005b9628', 'answers': [{'text': 'France', 'answer_start': 159}, {'text': 'France', 'answer_start': 159}, {'text': 'France', 'answer_start': 159}, {'text': 'France', 'answer_start': 159}], 'is_impossible': False}, {'question': 'When were the Normans in Normandy?', 'id': '56ddde6b9a695914005b9629', 'answers': [{'text': '10th and 11th centuries', 'answer_start': 94}, {'text': 'in the 10th and 11th centuries', 'answer_start': 87}
I want to query all the Questions and the Answeres, from all the rows in the DF. So ideally, the output is like this:
Question Answer
'In what country is Normandy located?' 'France'
'When were the Normans in Normandy?' 'in the 10th and 11th centuries'
Sorry in advance! I have read the 'Good example' post. But I found it hard to produce reproducible data for this example, since it looks like it is a dictionary, with a list inside, within the list a small dictionary, within that another dictionary, then again a dictionary... when I use print(df["data"]), it is only printing a small subset...(which is not helping to reproduce this problem).
print(df['data'])
0 {'title': 'Normans', 'paragraphs': [{'qas': [{...
1 {'title': 'Computational_complexity_theory', '...
Name: data, dtype: object
Many thanks in advance!