pandas read_json: "If using all scalar values, you must pass an index"

Question

I have some difficulty in importing a JSON file with pandas.

import pandas as pd
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')

This is the error that I get:

ValueError: If using all scalar values, you must pass an index

The file structure is simplified like this:

{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}

It is from the machine learning course of University of Washington on Coursera. You can find the file here.

This is much more a pandas question than it it a JSON question -- you wouldn't have this specific error in any context that *didn't* involve pandas, but you **can** get this specific error without JSON being involved. — Charles Duffy, Jul 14 '16 at 17:44
See for instance, http://stackoverflow.com/questions/17839973/construct-pandas-dataframe-from-values-in-variables -- a question with the same error, but no JSON involved. — Charles Duffy, Jul 14 '16 at 17:46
It is expecting a list. So if you do like this will work. `pd.DataFrame([{"biennials": 522004, "lb915": 116290}])`. — Aung, Jun 30 '17 at 08:06

score 82 · Accepted Answer · answered Jul 14 '16 at 18:05

Try

ser = pd.read_json('people_wiki_map_index_to_word.json', typ='series')

That file only contains key value pairs where values are scalars. You can convert it to a dataframe with ser.to_frame('count').

You can also do something like this:

import json
with open('people_wiki_map_index_to_word.json', 'r') as f:
    data = json.load(f)

Now data is a dictionary. You can pass it to a dataframe constructor like this:

df = pd.DataFrame({'count': data})

score 19 · Answer 2 · answered Apr 17 '19 at 05:42

You can do as @ayhan mention which will give you a column base format

Or you can enclose the object in [ ] (source) as shown below to give you a row format that will be convenient if you are loading multiple values and planing on using matrix for your machine learning models.

df = pd.DataFrame([data])

score 6 · Answer 3 · answered Apr 14 '17 at 04:56

I think what is happening is that the data in

map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')

is being read as a string instead of a json

{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}

is actually

'{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}'

Since a string is a scalar, it wants you to load it as a json, you have to convert it to a dict which is exactly what the other response is doing

The best way is to do a json loads on the string to convert it to a dict and load it into pandas

myfile=f.read()
jsonData=json.loads(myfile)
df=pd.DataFrame(data)

thanks for the explaination, I have been looking for the reason for some time already. — Marine Galantin, Jun 02 '20 at 23:39

score 5 · Answer 4 · answered Oct 02 '21 at 13:06

5

{
"biennials": 522004,
"lb915": 116290
}

df = pd.read_json('values.json')

As pd.read_json expects a list

{
"biennials": [522004],
"lb915": [116290]
}

for a particular key, it returns an error saying

If using all scalar values, you must pass an index.

So you can resolve this by specifying 'typ' arg in pd.read_json

map_index_to_word = pd.read_json('Datasets/people_wiki_map_index_to_word.json', typ='dictionary')

answered Oct 02 '21 at 13:06

Shantanu Tripathi

51
1
4

`typ='dictionary'` was what i needed – Kenan Sep 22 '22 at 18:26

obiradaniel · Answer 5 · 2022-07-12T09:10:53.243

For newer pandas, 0.19.0 and later, use the lines parameter, set it to True. The file is read as a json object per line.

import pandas as pd
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json', lines=True)

If fixed the following errors I encountered especially when some of the json files have only one value:

ValueError: If using all scalar values, you must pass an index
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
ValueError: Trailing data

score 1 · Answer 6 · answered May 10 '21 at 16:28

For example cat values.json

{
name: "Snow",
age: "31"
}

df = pd.read_json('values.json')

Chances are you might end up with this Error: if using all scalar values, you must pass an index

Pandas looks up for a list or dictionary in the value. Something like cat values.json

{
name: ["Snow"],
age: ["31"]
}

So try doing this. Later on to convert to html tohtml()

df = pd.DataFrame([pd.read_json(report_file,  typ='series')])
result = df.to_html()

score 0 · Answer 7 · answered Apr 01 '22 at 06:41

I solved this by converting it into an array like so

[{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}]

score 0 · Answer 8 · answered Apr 07 '23 at 15:28

A complete working example

First I save the data you posted in a json file:

import json
import pandas as pd

json_data = {"biennials": 522004, "lb915": 116290, "shatzky": 127647,
             "woode": 174106, "damfunk": 133206, "nualart": 153444,
             "hatefillot": 164111, "missionborn": 261765,
             "yeardescribed": 161075, "theoryhe": 521685}

save_fpath = '/content/sample_file.json'
with open(save_fpath, 'w') as f:
    json.dump(json_data, f)

Then, using the method proposed by obiradaniel, it's possible to obtain a pandas dataframe through this code:

sample_df = pd.read_json(save_fpath, lines=True).T.reset_index()
sample_df.columns = ['col_1', 'col_2']
sample_df

Basically, the keys of the json_data dictionary are transformed into column names using lines=True. For this reason I then transpose the dataframe (column names become index names) and then reset the index. Finally, new column names are assigned.

It's possible to skip the transposition step using the orient argument. In this way, the keys of the json_data dict are read from the beginning as index names. Here is an example using that:

sample_df = pd.read_json(save_fpath, orient='index').reset_index()
sample_df.columns = ['col_1', 'col_2']
sample_df

In both ways, the resulting dataframe is the following:

	col_1	col_2
0	biennials	522004
1	lb915	116290
2	shatzky	127647
3	woode	174106
4	damfunk	133206
5	nualart	153444
6	hatefillot	164111
7	missionborn	261765
8	yeardescribed	161075
9	theoryhe	521685

pandas read_json: "If using all scalar values, you must pass an index"

8 Answers8

A complete working example

Linked

Related