Constructing pandas DataFrame from values in variables gives "ValueError: If using all scalar values, you must pass an index"

Question

This may be a simple question, but I can not figure out how to do this. Lets say that I have two variables as follows.

a = 2
b = 3

I want to construct a DataFrame from this:

df2 = pd.DataFrame({'A':a,'B':b})

This generates an error:

ValueError: If using all scalar values, you must pass an index

I tried this also:

df2 = (pd.DataFrame({'a':a,'b':b})).reset_index()

This gives the same error message.

Am I missing something? isn't it trivial that no `.foo()` would solve the error since the exception is produced when evaluating the DataFrame constructor? — Lucas Alonso, Jan 25 '21 at 15:59

score 1142 · Accepted Answer · answered Jul 24 '13 at 16:49

1142

The error message says that if you're passing scalar values, you have to pass an index. So you can either not use scalar values for the columns -- e.g. use a list:

>>> df = pd.DataFrame({'A': [a], 'B': [b]})
>>> df
   A  B
0  2  3

or use scalar values and pass an index:

>>> df = pd.DataFrame({'A': a, 'B': b}, index=[0])
>>> df
   A  B
0  2  3

answered Jul 24 '13 at 16:49

DSM

342,061
65
592
494

16

Perhaps it is because the order of items in a list in Python are persistent whereas the ordering of items in a dictionary are not. You can instantiate a DataFrame with an empty dictionary. In principle I suppose a single-row DataFrame as shown here would also be ok to build from a dictionary because the order does not matter (but this hasn't been implemented). However with multiple rows, Pandas would not be able to make a DataFrame because it would not know which items belonged to the same row. – Alexander Apr 27 '18 at 14:14
5

@VitalyIsaev - In that case, the dataframe row (represented by the given dictionary) has no index (not even an implicit one). A simple solution is to wrap the dictionary within a list, which does have "natural indexing". One can claim that if only one dictionary is given (without a wrapping list), then assume `index=0`, but that can lead to accidental misuse (thinking that a single dictionary can somehow create a multi-row dataframe) – Ori Nov 10 '18 at 15:10
several solutions in this link https://eulertech.wordpress.com/2017/11/28/pandas-valueerror-if-using-all-scalar-values-you-must-pass-an-index/ – Jia Gao Nov 12 '18 at 00:19
2

The reason for this is because DataFrames are meant to hold two-dimensional data (i.e. rows of OP's two variables). If you want to simply hold index -> value pairs (like a Dictionary), then you should use a Series, as [Rob](https://stackoverflow.com/a/36670644/235463) suggests. – danuker Mar 16 '19 at 06:18
This is a single sample/row Dataframe, so index = [0] makes logical sense; but you could also manipulate it to be index=[100], which works. Q: Isn't Index supposed to logically ordered incrementally, why does python allow Index manipulation? – Sumax Aug 02 '19 at 06:08
Why is this answer all the way down at the bottom? I thought SO had a mechanism for moving the better answers to the top?! – Malik A. Rumi Jun 07 '22 at 22:25
Is there anything bad that happens if only one element is in a list? So `{'a': [1], 'b': 2}`? I have found I can get away with placing just one of the dict values in a list. – Hendy Jun 22 '23 at 14:28

score 225 · Answer 2 · edited Nov 08 '21 at 10:52

225

You may try wrapping your dictionary into a list:

my_dict = {'A':1,'B':2}
pd.DataFrame([my_dict])

   A  B
0  1  2

edited Nov 08 '21 at 10:52

vvvvv

25,404
19
49
81

answered Jan 31 '19 at 11:00

NewBie

3,124
2
11
19

2

It worked also for large dictionaries with several data types just by putting the dictionary in brackets `[ ]` as you mentioned @NewBie. The accepted answer wasn't so fast because needed doing this for all the scalar values, thanks! – Elias Dec 17 '20 at 09:45
6

hallelujah, this should be the best answer - convenience is key – Brndn Mar 10 '22 at 12:07
2

I prefer this to the top answer. Simple and clean. – ichthyophile Jul 18 '22 at 16:01
This is great. Adding a `.transpose()` is useful for most practical cases to convert wide to long, i.e. `pd.DataFrame([my_dict]).transpose()` – mellifluous Mar 07 '23 at 16:03

score 110 · Answer 3 · answered Mar 13 '16 at 13:26

110

You can also use pd.DataFrame.from_records which is more convenient when you already have the dictionary in hand:

df = pd.DataFrame.from_records([{ 'A':a,'B':b }])

You can also set index, if you want, by:

df = pd.DataFrame.from_records([{ 'A':a,'B':b }], index='A')

answered Mar 13 '16 at 13:26

fAX

1,451
1
10
11

1

@DaveKielpinski Please, check if you passed a *list* to the "from_records" method; otherwise it won't work, and you'll get the same error message as when you call DataFrame on the dictionary. – mairan Jul 05 '19 at 13:48
Same issue as @DaveKielpinski until I realised I was using `from_records` on individual documents, not on an array of such. Just posting this in case it reminds you to double check whether you're doing it right. – Voy Aug 22 '19 at 11:02
@mingchau: That's standard behavior, so not relevant to the question at hand. – user1071847 Oct 14 '19 at 11:50

MLguy · Answer 4 · 2019-01-16T16:33:37.923

83

You need to create a pandas series first. The second step is to convert the pandas series to pandas dataframe.

import pandas as pd
data = {'a': 1, 'b': 2}
pd.Series(data).to_frame()

You can even provide a column name.

pd.Series(data).to_frame('ColumnName')

edited Jan 16 '19 at 16:33

answered Sep 12 '17 at 10:58

MLguy

1,776
3
15
28

1

This worked for me. My dictionary had integer keys and ndarray values. – StatsSorceress Oct 22 '18 at 15:03
3

`pd.Series(data).to_frame('ColumnName')` is shorter, although this equivalent is perhaps more direct: `pd.DataFrame.from_dict(data, orient='index', columns=['ColumnName'])` – Alex F Apr 13 '19 at 13:43
This worked for me, too, in the same case as @StatsSorceress. – muammar Jan 28 '21 at 12:32
This doesn't create the same structure as asked. with this approach I got a dataframe with 1 column and two rows (A and B), but the results should be a datafarme with 1 row and two columns (A and B) – shlomiLan Feb 24 '22 at 10:48
@shlomiLan This is the structure I wanted, and what I figured the OP was looking for based on the question. Though the fact that they accepted the answer which has a single row indicates otherwise.... – nealmcb Jul 29 '22 at 00:45

score 17 · Answer 5 · edited May 23 '17 at 11:47

17

Maybe Series would provide all the functions you need:

pd.Series({'A':a,'B':b})

DataFrame can be thought of as a collection of Series hence you can :

Concatenate multiple Series into one data frame (as described here )
Add a Series variable into existing data frame ( example here )

edited May 23 '17 at 11:47

Community

1
1

answered Apr 16 '16 at 22:43

Rob

342
2
3

This is the golden answer - then reassign the series back to a column (e.g. when using `df.apply()`) – jtlz2 Mar 31 '22 at 14:02

score 14 · Answer 6 · answered Oct 14 '18 at 06:26

Pandas magic at work. All logic is out.

The error message "ValueError: If using all scalar values, you must pass an index" Says you must pass an index.

This does not necessarily mean passing an index makes pandas do what you want it to do

When you pass an index, pandas will treat your dictionary keys as column names and the values as what the column should contain for each of the values in the index.

a = 2
b = 3
df2 = pd.DataFrame({'A':a,'B':b}, index=[1])

    A   B
1   2   3

Passing a larger index:

df2 = pd.DataFrame({'A':a,'B':b}, index=[1, 2, 3, 4])

    A   B
1   2   3
2   2   3
3   2   3
4   2   3

An index is usually automatically generated by a dataframe when none is given. However, pandas does not know how many rows of 2 and 3 you want. You can however be more explicit about it

df2 = pd.DataFrame({'A':[a]*4,'B':[b]*4})
df2

    A   B
0   2   3
1   2   3
2   2   3
3   2   3

The default index is 0 based though.

I would recommend always passing a dictionary of lists to the dataframe constructor when creating dataframes. It's easier to read for other developers. Pandas has a lot of caveats, don't make other developers have to experts in all of them in order to read your code.

This explanation was what i was looking for. – Anshuman Jayaprakash Jan 24 '22 at 03:39 — Anshuman Jayaprakash, Jan 24 '22 at 03:39

score 12 · Answer 7 · answered Oct 12 '20 at 18:30

12

I usually use the following to to quickly create a small table from dicts.

Let's say you have a dict where the keys are filenames and the values their corresponding filesizes, you could use the following code to put it into a DataFrame (notice the .items() call on the dict):

files = {'A.txt':12, 'B.txt':34, 'C.txt':56, 'D.txt':78}
filesFrame = pd.DataFrame(files.items(), columns=['filename','size'])
print(filesFrame)

  filename  size
0    A.txt    12
1    B.txt    34
2    C.txt    56
3    D.txt    78

answered Oct 12 '20 at 18:30

Moritz Molch

153
1
4

1

This is helpful but note it doesn't work on pandas 0.23.4 – for_all_intensive_purposes Dec 14 '20 at 02:40
For me this was perfect! Having simply two rows of data in a dictionary and turning that in to a dataframe shouldn't be that hard. – Michel K Mar 17 '21 at 09:56
thanks, exactly what I am looking for – yondchang Jul 20 '23 at 01:42

score 11 · Answer 8 · answered Mar 30 '18 at 02:02

11

You could try:

df2 = pd.DataFrame.from_dict({'a':a,'b':b}, orient = 'index')

From the documentation on the 'orient' argument: If the keys of the passed dict should be the columns of the resulting DataFrame, pass ‘columns’ (default). Otherwise if the keys should be rows, pass ‘index’.

answered Mar 30 '18 at 02:02

Matthew Connell

127
1
3

1

This does not solve the question asked, it produces a different result than desired. – Ken Williams Mar 11 '20 at 20:42
@KenWilliams I'm confused. It looks like this provides either the result that I thought the OP wanted, or the result that it seems you think OP wanted (and the one that MLguy though OP wanted). So it is the most flexible of all the answers with this many votes. – nealmcb Jul 29 '22 at 00:50
1

@nealmcb That's indeed claimed by the answerer, but when using `orient='columns'`, it just gives the same `If using all scalar values, you must pass an index` error as in the original question. I should have clarified that point in my comment. – Ken Williams Aug 01 '22 at 04:01

score 10 · Answer 9 · answered Jul 24 '13 at 16:49

10

You need to provide iterables as the values for the Pandas DataFrame columns:

df2 = pd.DataFrame({'A':[a],'B':[b]})

answered Jul 24 '13 at 16:49

ely

74,674
34
147
228

score 10 · Answer 10 · answered Jul 04 '18 at 11:16

10

I had the same problem with numpy arrays and the solution is to flatten them:

data = {
    'b': array1.flatten(),
    'a': array2.flatten(),
}

df = pd.DataFrame(data)

answered Jul 04 '18 at 11:16

MicheleDIncecco

109
1
6

There are no arrays (`array1`, `array2`) in the original question, the values are scalars. Is this answering some different question? – Ken Williams Aug 01 '22 at 04:03

score 10 · Answer 11 · answered Feb 18 '22 at 01:48

To figure out the "ValueError" understand DataFrame and "scalar values" is needed.
To create a Dataframe from dict, at least one Array is needed.

IMO, array itself is indexed.
Therefore, if there is an array-like value there is no need to specify index.
e.g. The index of each element in ['a', 's', 'd', 'f'] are 0,1,2,3 separately.

df_array_like = pd.DataFrame({
    'col' : 10086,
    'col_2' : True,
    'col_3' : "'at least one array'",
    'col_4' : ['one array is arbitrary length', 'multi arrays should be the same length']}) 
print("df_array_like: \n", df_array_like)

Output:

df_array_like: 
      col  col_2                 col_3                                   col_4
0  10086   True  'at least one array'           one array is arbitrary length
1  10086   True  'at least one array'  multi arrays should be the same length

As shows in the output, the index of the DataFrame is 0 and 1.
Coincidently same with the index of the array ['one array is arbitrary length', 'multi arrays should be the same length']

If comment out the 'col_4', it will raise

ValueError("If using all scalar values, you must pass an index")

Cause scalar value (integer, bool, and string) does not have index
Note that Index(...) must be called with a collection of some kind
Since index used to locate all the rows of DataFrame
index should be an array. e.g.

df_scalar_value = pd.DataFrame({
'col' : 10086,
'col_2' : True,
'col_3' : "'at least one array'"
}, index = ['fst_row','snd_row','third_row']) 
print("df_scalar_value: \n", df_scalar_value)

Output:

df_scalar_value: 
              col  col_2                 col_3
fst_row    10086   True  'at least one array'
snd_row    10086   True  'at least one array'
third_row  10086   True  'at least one array'

I'm a beginner, I'm learning python and English.

M. John · Answer 12 · 2021-11-26T14:24:15.210

9

import pandas as pd
 a=2
 b=3
dict = {'A': a, 'B': b}

pd.DataFrame(pd.Series(dict)).T  
# *T :transforms the dataframe*

   Result:
    A   B
0   2   3

edited Nov 26 '21 at 14:24

answered Nov 02 '21 at 05:51

M. John

101
1
5

4

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 02 '21 at 06:16
3

Your answer adds `.T` to what other answers have suggested. Can you add an explanation of how this makes a difference? – joanis Nov 02 '21 at 14:50
1

There are twenty-one existing answers to this question, including an accepted answer with 836 upvotes (!!!). Are you sure your answer hasn't already been provided? If not, why might someone prefer your approach over the existing approaches proposed? Are you taking advantage of new capabilities? Are there scenarios where your approach is better suited? Explanations are _always_ useful, but are _especially_ important here. – Jeremy Caney Nov 27 '21 at 01:31

score 5 · Answer 13 · answered May 03 '21 at 13:40

I tried transpose() and it worked. Downside: You create a new object.

testdict1 = {'key1':'val1','key2':'val2','key3':'val3','key4':'val4'}

df = pd.DataFrame.from_dict(data=testdict1,orient='index')
print(df)
print(f'ID for DataFrame before Transpose: {id(df)}\n')

df = df.transpose()
print(df)
print(f'ID for DataFrame after Transpose: {id(df)}')

Output

         0
key1  val1
key2  val2
key3  val3
key4  val4
ID for DataFrame before Transpose: 1932797100424

   key1  key2  key3  key4
0  val1  val2  val3  val4
ID for DataFrame after Transpose: 1932797125448

```

score 4 · Answer 14 · edited Dec 24 '19 at 10:35

4

the input does not have to be a list of records - it can be a single dictionary as well:

pd.DataFrame.from_records({'a':1,'b':2}, index=[0])
   a  b
0  1  2

Which seems to be equivalent to:

pd.DataFrame({'a':1,'b':2}, index=[0])
   a  b
0  1  2

edited Dec 24 '19 at 10:35

cs95

379,657
97
704
746

answered Apr 24 '18 at 15:12

S.V

2,149
2
18
41

score 3 · Answer 15 · answered Jul 22 '17 at 13:53

3

This is because a DataFrame has two intuitive dimensions - the columns and the rows.

You are only specifying the columns using the dictionary keys.

If you only want to specify one dimensional data, use a Series!

answered Jul 22 '17 at 13:53

danuker

861
10
26

score 3 · Answer 16 · answered Nov 30 '17 at 19:34

If you intend to convert a dictionary of scalars, you have to include an index:

import pandas as pd

alphabets = {'A': 'a', 'B': 'b'}
index = [0]
alphabets_df = pd.DataFrame(alphabets, index=index)
print(alphabets_df)

Although index is not required for a dictionary of lists, the same idea can be expanded to a dictionary of lists:

planets = {'planet': ['earth', 'mars', 'jupiter'], 'length_of_day': ['1', '1.03', '0.414']}
index = [0, 1, 2]
planets_df = pd.DataFrame(planets, index=index)
print(planets_df)

Of course, for the dictionary of lists, you can build the dataframe without an index:

planets_df = pd.DataFrame(planets)
print(planets_df)

score 2 · Answer 17 · edited Feb 20 '23 at 18:46

2

You could try this:

df2 = pd.DataFrame.from_dict({'a':a,'b':b}, orient = 'index')

edited Feb 20 '23 at 18:46

RF1991

2,037
4
8
17

answered Aug 15 '20 at 02:52

Dirck Heyne Dávila Llave

29
1

4

This is the exact same answer posted by @MathewConnell, except without formatting... – Julio Cezar Silva Aug 15 '20 at 03:11

score 2 · Answer 18 · edited Sep 29 '20 at 20:53

2

Change your 'a' and 'b' values to a list, as follows:

a = [2]
b = [3]

then execute the same code as follows:

df2 = pd.DataFrame({'A':a,'B':b})
df2

and you'll get:

    A   B
0   2   3

edited Sep 29 '20 at 20:53

Paul H

65,268
20
159
136

answered Aug 26 '20 at 08:16

Kalpana

103
5

score 2 · Answer 19 · answered Dec 13 '20 at 17:09

2

simplest options ls :

dict  = {'A':a,'B':b}
df = pd.DataFrame(dict, index = np.arange(1) )

answered Dec 13 '20 at 17:09

DataYoda

771
5
18

score 2 · Answer 20 · answered Dec 27 '20 at 09:44

Another option is to convert the scalars into list on the fly using Dictionary Comprehension:

df = pd.DataFrame(data={k: [v] for k, v in mydict.items()})

The expression {...} creates a new dict whose values is a list of 1 element. such as :

In [20]: mydict
Out[20]: {'a': 1, 'b': 2}

In [21]: mydict2 = { k: [v] for k, v in mydict.items()}

In [22]: mydict2
Out[22]: {'a': [1], 'b': [2]}

score 1 · Answer 21 · answered Apr 24 '19 at 13:45

1

Convert Dictionary to Data Frame

col_dict_df = pd.Series(col_dict).to_frame('new_col').reset_index()

Give new name to Column

col_dict_df.columns = ['col1', 'col2']

answered Apr 24 '19 at 13:45

kamran kausar

4,117
1
23
17

score -2 · Answer 22 · answered Apr 08 '16 at 14:53

-2

If you have a dictionary you can turn it into a pandas data frame with the following line of code:

pd.DataFrame({"key": d.keys(), "value": d.values()})

answered Apr 08 '16 at 14:53

ingrid

555
4
17

It works, but IMHO it doesn't make much sense ` fruits_count = defaultdict(int) fruits_count["apples"] = 10 fruits_count["bananas"] = 21 pd.DataFrame({"key" : fruits_count.keys(), "value" : fruits_count.values()}) Out: key value 0 (bananas, apples) (21, 10) 1 (bananas, apples) (21, 10) – Emiter Jul 22 '17 at 22:50

score -3 · Answer 23 · answered Sep 26 '18 at 12:36

-3

Just pass the dict on a list:

a = 2
b = 3
df2 = pd.DataFrame([{'A':a,'B':b}])

answered Sep 26 '18 at 12:36

LeandroHumb

843
8
23

Constructing pandas DataFrame from values in variables gives "ValueError: If using all scalar values, you must pass an index"

23 Answers23

Linked

Related