0

Update Json sample:

{
"header":{"time_cost_ms":3.638,"time_cost":0.003638,"core_time_cost_ms":3.6,"ret_code":"succ"},
"norm_str":"Women's March finally replaces three original leaders after anti-Semitism accusations",
"lang":"en",
"word_list":[
    {"str":"Women","hit":[0,5,0,1],"tag":"NNS"},
    {"str":"'s","hit":[5,2,1,2],"tag":"POS"},
    {"str":"March","hit":[8,5,3,1],"tag":"NNP"},
    {"str":"finally","hit":[14,7,4,1],"tag":"RB"},
    {"str":"replaces","hit":[22,8,5,1],"tag":"VBZ"},
    {"str":"three","hit":[31,5,6,1],"tag":"CD"},
    {"str":"original","hit":[37,8,7,1],"tag":"JJ"},
    {"str":"leaders","hit":[46,7,8,1],"tag":"NNS"},
    {"str":"after","hit":[54,5,9,1],"tag":"IN"},
    {"str":"anti","hit":[60,4,10,1],"tag":"NN"},
    {"str":"-","hit":[64,1,11,1],"tag":"HYPH"},
    {"str":"Semitism","hit":[65,8,12,1],"tag":"NNP"},
    {"str":"accusations","hit":[74,11,13,1],"tag":"NNS"}
],
"phrase_list":[
    {"str":"Women's March","hit":[0,13,0,4],"tag":"NNP"},
    {"str":"finally","hit":[14,7,4,1],"tag":"RB"},
    {"str":"replaces","hit":[22,8,5,1],"tag":"VBZ"},
    {"str":"three","hit":[31,5,6,1],"tag":"CD"},
    {"str":"original","hit":[37,8,7,1],"tag":"JJ"},
    {"str":"leaders","hit":[46,7,8,1],"tag":"NNS"},
    {"str":"after","hit":[54,5,9,1],"tag":"IN"},
    {"str":"anti-Semitism","hit":[60,13,10,3],"tag":"NN"},
    {"str":"accusations","hit":[74,11,13,1],"tag":"NNS"}
],
"entity_list":[
    {"str":"Women’s March","hit":[0,13,0,4],"type":{"name":"org.generic","i18n":"organization","path":"\/"},"meaning":{"related":["Black Lives Matter", "Planned Parenthood", "women's rights", "MoveOn", "indivisible", "activism", "Greenpeace", "Stand Up America", "feminism"]},"tag":"org.generic","tag_i18n":"organization"},
    {"str":"three","hit":[31,5,6,1],"type":{"name":"quantity.generic","i18n":"quantity","path":"\/math.n_exp\/"},"meaning":{"value":[3]},"tag":"quantity.generic","tag_i18n":"quantity"}
],
"syntactic_parsing_str":"",
"srl_str":"",
"engine_version":"0.3.0"

}

Is there any way to trans the data to a dataframe? I would like to merge the results with origenal dataset.

And please also help fix the 'string indices must be integers' issue

  • Does this answer your question? [JSON to pandas DataFrame](https://stackoverflow.com/questions/21104592/json-to-pandas-dataframe) – Dhana D. Aug 30 '21 at 04:04

1 Answers1

0

I would assume that you received this json from an API because of header key.

Lets load the json file first:

with open(<json file path>, 'r') as json_file:
    json_example = json.loads(json_file)

The pd.json_normalize() may not work as intended if you supply your example. you need to provide the contents of the "phrase_list" key to it:

df = pd.json_normalize(json_example['phrase_list'])

Result:

|    | str           | hit             | tag   |
|---:|:--------------|:----------------|:------|
|  0 | Women's March | [0, 13, 0, 4]   | NNP   |
|  1 | finally       | [14, 7, 4, 1]   | RB    |
|  2 | replaces      | [22, 8, 5, 1]   | VBZ   |
|  3 | three         | [31, 5, 6, 1]   | CD    |
|  4 | original      | [37, 8, 7, 1]   | JJ    |
|  5 | leaders       | [46, 7, 8, 1]   | NNS   |
|  6 | after         | [54, 5, 9, 1]   | IN    |
|  7 | anti-Semitism | [60, 13, 10, 3] | NN    |
|  8 | accusations   | [74, 11, 13, 1] | NNS   |

Then you can explode hit column to have a clean table:

df = df.explode("hit")

result:

|    | str           |   hit | tag   |
|---:|:--------------|------:|:------|
|  0 | Women's March |     0 | NNP   |
|  0 | Women's March |    13 | NNP   |
|  0 | Women's March |     0 | NNP   |
|  0 | Women's March |     4 | NNP   |
|  1 | finally       |    14 | RB    |
|  1 | finally       |     7 | RB    |
|  1 | finally       |     4 | RB    |
|  1 | finally       |     1 | RB    |
|  2 | replaces      |    22 | VBZ   |
|  2 | replaces      |     8 | VBZ   |
|  2 | replaces      |     5 | VBZ   |
|  2 | replaces      |     1 | VBZ   |
|  3 | three         |    31 | CD    |
.
.
.
Babak Fi Foo
  • 926
  • 7
  • 17