2

I am trying to add a column with values from a dictionary. It will be easy to show you the dummy data.

df = pd.DataFrame({'id':[1,2,3,2,5], 'grade':[5,2,2,1,3]})

dictionary = {'1':[5,8,6,3], '2':[1,2], '5':[8,6,2]}

Notice that not every id is in the dictionary and the values which are the lists. I want to find the row in the df that matches with the keys in the dictionary and add the list in one column. So the desired output will look like this:

output = pd.DataFrame({'id':[1,2,3,2,5], 'grade':[5,2,2,1,3], 'new_column':[[5,8,6,3],[1,2],[],[1,2],[8,6,2]]})
halfer
  • 19,824
  • 17
  • 99
  • 186
Yun Tae Hwang
  • 1,249
  • 3
  • 18
  • 30
  • B.t.w.: Always check if a question has already been asked before posting. If you google your question you will see there are many existing duplicates of it. E.g. [pandas - add new column to dataframe from dictionary](https://stackoverflow.com/q/29794959/1609514) – Bill Aug 14 '20 at 22:29
  • Does this answer your question? [pandas - add new column to dataframe from dictionary](https://stackoverflow.com/questions/29794959/pandas-add-new-column-to-dataframe-from-dictionary) – Bill Aug 14 '20 at 22:31

3 Answers3

2

Is this what you want?

df = df.set_index('id')
dictionary = {1:[5,8,6,3], 2:[1,2], 5:[8,6,2]}    
df['new_column'] = pd.Series(dictionary)

Note: The keys of the dictionary need to be the same type (int) as the index of the data frame.

>>> print(df)
    gender    new_column
id                      
1        0  [5, 8, 6, 3]
2        0        [1, 2]
3        1           NaN
4        1           NaN
5        1     [8, 6, 2]

Update:

A better solution if 'id' column contains duplicates (see comments below):

df['new_column'] = df['id'].map(dictionary)
Bill
  • 10,323
  • 10
  • 62
  • 85
  • I updated this answer to set the index of `df` to the `'id'` values before assigning the new data. – Bill Aug 14 '20 at 22:03
  • I noticed that in the data frame, the same id appears multiple time. would that be okay with your code? – Yun Tae Hwang Aug 14 '20 at 22:06
  • I think it will work but an index should not have duplicates so in that case it is probably not advisable. A better option then is to use `df.map` as explained in [this answer](https://stackoverflow.com/a/41678874/1609514). Like this: `df['new_column'] = df['id'].map(dictionary)`. – Bill Aug 14 '20 at 22:08
0
import pandas as pd

df = pd.DataFrame({'id':[1,2,3,4,5], 'gender':[0,0,1,1,1]})

dictionary = {'1':[5,8,6,3], '2':[1,2], '5':[8,6,2]}

then just create a list with the values you want and add them to your dataframe

newValues = [ dictionary.get(str(val),[]) for val in df['id'].values]

df['new_column'] = newValues


>>> print(df)
    gender    new_column
id                      
1        0  [5, 8, 6, 3]
2        0        [1, 2]
3        1            []
4        1            []
5        1     [8, 6, 2]
sebrojas
  • 881
  • 10
  • 15
0

You can construct your column using special dictionaries that has a value [] by default.

from collections import defaultdict
default_dictionary = defaultdict(list)
id = [1,2,3,4,5]
dictionary = {'1':[5,8,6,3], '2':[1,2], '5':[8,6,2]}
for n in dictionary:
    default_dictionary[n] = dictionary[n]
new_column = [default_dictionary[str(n)] for n in id]

new_column is [[5, 8, 6, 3], [1, 2], [], [], [8, 6, 2]] now and you can pass it to your last argument of pd.DataFrame(...)

mathfux
  • 5,759
  • 1
  • 14
  • 34
  • 1
    There is a bit nicer way to make a defaultdict from an existing dictionary: `default_dictionary = defaultdict(list, **dictionary)`. At the end, it also makes sense to just assign the column, like Bill did in his answer – Marat Aug 14 '20 at 22:10
  • Oh this is much better, indeed. Thank you – mathfux Aug 14 '20 at 22:11