11

I have a Pandas Dataframe and want to add the data from a dictionary uniformly to all rows in my dataframe. Currently I loop over the dictionary and set the value to my new columns. Is there a more efficient way to do this?

notebook

# coding: utf-8    
import pandas as pd

df = pd.DataFrame({'age' : [1, 2, 3],'name' : ['Foo', 'Bar', 'Barbie']}) 
d = {"blah":42,"blah-blah":"bar"}
for k,v in d.items():
    df[k] = v
df
Rutger Hofste
  • 4,073
  • 3
  • 33
  • 44

3 Answers3

13

Use assign if all keys are not numeric:

df = df.assign(**d)
print (df)
   age    name  blah blah-blah
0    1     Foo    42       bar
1    2     Bar    42       bar
2    3  Barbie    42       bar

If possible numeric join working nice:

d = {8:42,"blah-blah":"bar"}
df = df.join(pd.DataFrame(d, index=df.index))
print (df)

   age    name   8 blah-blah
0    1     Foo  42       bar
1    2     Bar  42       bar
2    3  Barbie  42       bar
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Even though you got multiple upvotes already I don't agree with your solution. OP is asking for more efficient code and I'm pretty certain this isn't. I think your solution is smart (for being different!) but not smart in (more readable). But this is just my opinion. – Anton vBR Apr 13 '18 at 13:58
  • can you explain what the ** does to make the answer more understandable? – Rutger Hofste Apr 16 '18 at 14:39
  • 1
    `**` packs the passed argument into dictionary when used in `assign`. You can also check [this](https://stackoverflow.com/a/36981090) – jezrael Apr 16 '18 at 14:44
5

The answer in my opinion is no. Looping through key,values in a dict is already efficient and assigning columns with df[k] = v is more readable. Remember that in the future you just want to remember why you did something and you won't care much if you spare some microseconds. The only thing missing is a comment why you do it.

d = {"blah":42,"blah-blah":"bar"}

# Add columns to compensate for missing values in document XXX
for k,v in d.items():
    df[k] = v

Timings (but the error is too big... I'd say they are equivalent in speed):

Your solution:

809 µs ± 70 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

df.assign():

893 µs ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
-2
import pandas as pd
df= pd.DataFrame(columns=['A','B'])
print(df)
print(df.columns)
print('-------------------Done-------')
# First dict
ddict={'A': 34, 'B': 56}
xdf=pd.Series(ddict, name=df.shape[0])
print(xdf)
df=pd.concat([df.T,xdf], axis=1).T
print(df)
# Second dict
edict={'A': 34, 'B': 56}
xdf=pd.Series(edict, name=df.shape[0])
print(xdf)
df=pd.concat([df.T,xdf], axis=1).T
print(df)
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 30 '22 at 07:39