59

I have some difficulty in importing a JSON file with pandas.

import pandas as pd
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')

This is the error that I get:

ValueError: If using all scalar values, you must pass an index

The file structure is simplified like this:

{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}

It is from the machine learning course of University of Washington on Coursera. You can find the file here.

smci
  • 32,567
  • 20
  • 113
  • 146
Marco Fumagalli
  • 2,307
  • 3
  • 23
  • 41
  • 2
    This is much more a pandas question than it it a JSON question -- you wouldn't have this specific error in any context that *didn't* involve pandas, but you **can** get this specific error without JSON being involved. – Charles Duffy Jul 14 '16 at 17:44
  • 1
    See for instance, http://stackoverflow.com/questions/17839973/construct-pandas-dataframe-from-values-in-variables -- a question with the same error, but no JSON involved. – Charles Duffy Jul 14 '16 at 17:46
  • 1
    look like you are taking the ML courses from Emily :) – Bryan Fok Mar 09 '17 at 05:40
  • 1
    It is expecting a list. So if you do like this will work. `pd.DataFrame([{"biennials": 522004, "lb915": 116290}])`. – Aung Jun 30 '17 at 08:06

8 Answers8

82

Try

ser = pd.read_json('people_wiki_map_index_to_word.json', typ='series')

That file only contains key value pairs where values are scalars. You can convert it to a dataframe with ser.to_frame('count').

You can also do something like this:

import json
with open('people_wiki_map_index_to_word.json', 'r') as f:
    data = json.load(f)

Now data is a dictionary. You can pass it to a dataframe constructor like this:

df = pd.DataFrame({'count': data})
ayhan
  • 70,170
  • 20
  • 182
  • 203
19

You can do as @ayhan mention which will give you a column base format

Method 1

Or you can enclose the object in [ ] (source) as shown below to give you a row format that will be convenient if you are loading multiple values and planing on using matrix for your machine learning models.

df = pd.DataFrame([data])

Method 2

Adonis H.
  • 331
  • 2
  • 7
6

I think what is happening is that the data in

map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')

is being read as a string instead of a json

{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}

is actually

'{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}'

Since a string is a scalar, it wants you to load it as a json, you have to convert it to a dict which is exactly what the other response is doing

The best way is to do a json loads on the string to convert it to a dict and load it into pandas

myfile=f.read()
jsonData=json.loads(myfile)
df=pd.DataFrame(data)
Anant Gupta
  • 1,090
  • 11
  • 11
5
{
"biennials": 522004,
"lb915": 116290
}

df = pd.read_json('values.json')

As pd.read_json expects a list

{
"biennials": [522004],
"lb915": [116290]
}

for a particular key, it returns an error saying

If using all scalar values, you must pass an index.

So you can resolve this by specifying 'typ' arg in pd.read_json

map_index_to_word = pd.read_json('Datasets/people_wiki_map_index_to_word.json', typ='dictionary')
3

For newer pandas, 0.19.0 and later, use the lines parameter, set it to True. The file is read as a json object per line.

import pandas as pd
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json', lines=True)

If fixed the following errors I encountered especially when some of the json files have only one value:

  1. ValueError: If using all scalar values, you must pass an index
  2. JSONDecodeError: Expecting value: line 1 column 1 (char 0)
  3. ValueError: Trailing data
1

For example cat values.json

{
name: "Snow",
age: "31"
}

df = pd.read_json('values.json')

Chances are you might end up with this Error: if using all scalar values, you must pass an index

Pandas looks up for a list or dictionary in the value. Something like cat values.json

{
name: ["Snow"],
age: ["31"]
}

So try doing this. Later on to convert to html tohtml()

df = pd.DataFrame([pd.read_json(report_file,  typ='series')])
result = df.to_html()
Sonal
  • 579
  • 5
  • 7
0

I solved this by converting it into an array like so

[{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}]
Laksh Matai
  • 176
  • 1
  • 5
0

A complete working example

First I save the data you posted in a json file:

import json
import pandas as pd

json_data = {"biennials": 522004, "lb915": 116290, "shatzky": 127647,
             "woode": 174106, "damfunk": 133206, "nualart": 153444,
             "hatefillot": 164111, "missionborn": 261765,
             "yeardescribed": 161075, "theoryhe": 521685}

save_fpath = '/content/sample_file.json'
with open(save_fpath, 'w') as f:
    json.dump(json_data, f)

Then, using the method proposed by obiradaniel, it's possible to obtain a pandas dataframe through this code:

sample_df = pd.read_json(save_fpath, lines=True).T.reset_index()
sample_df.columns = ['col_1', 'col_2']
sample_df

Basically, the keys of the json_data dictionary are transformed into column names using lines=True. For this reason I then transpose the dataframe (column names become index names) and then reset the index. Finally, new column names are assigned.

It's possible to skip the transposition step using the orient argument. In this way, the keys of the json_data dict are read from the beginning as index names. Here is an example using that:

sample_df = pd.read_json(save_fpath, orient='index').reset_index()
sample_df.columns = ['col_1', 'col_2']
sample_df

In both ways, the resulting dataframe is the following:

col_1 col_2
0 biennials 522004
1 lb915 116290
2 shatzky 127647
3 woode 174106
4 damfunk 133206
5 nualart 153444
6 hatefillot 164111
7 missionborn 261765
8 yeardescribed 161075
9 theoryhe 521685
Aelius
  • 1,029
  • 11
  • 22