I have a dataframe with two columns and I intend to convert it to a dictionary. The first column will be the key and the second will be the value.
Dataframe:
id value
0 0 10.2
1 1 5.7
2 2 7.4
How can I do this?
I have a dataframe with two columns and I intend to convert it to a dictionary. The first column will be the key and the second will be the value.
Dataframe:
id value
0 0 10.2
1 1 5.7
2 2 7.4
How can I do this?
If lakes
is your DataFrame
, you can do something like
area_dict = dict(zip(lakes.id, lakes.value))
See the docs for to_dict
. You can use it like this:
df.set_index('id').to_dict()
And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()
):
df.set_index('id')['value'].to_dict()
mydict = dict(zip(df.id, df.value))
If you want a simple way to preserve duplicates, you could use groupby
:
>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value'])
>>> ptest
id value
0 a 1
1 a 2
2 b 3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}
The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.
For example:
>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value'])
>>> ptest
id value
0 a 1
1 a 2
2 b 3
# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}
If you have duplicated entries and do not want to lose them, you can use this ugly but working code:
>>> mydict = {}
>>> for x in range(len(ptest)):
... currentid = ptest.iloc[x,0]
... currentvalue = ptest.iloc[x,1]
... mydict.setdefault(currentid, [])
... mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}
Here is what I think is the simplest solution:
df.set_index('id').T.to_dict('records')
Example:
df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')
If you have multiple values, like val1, val2, val3, etc., and you want them as lists, then use the below code:
df.set_index('id').T.to_dict('list')
Read more about records
from above here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html
With pandas it can be done as:
If lakes is your DataFrame:
area_dict = lakes.to_dict('records')
You can use 'dict comprehension'
my_dict = {row[0]: row[1] for row in df.values}
in some versions the code below might not work
mydict = dict(zip(df.id, df.value))
so make it explicit
id_=df.id.values
value=df.value.values
mydict=dict(zip(id_,value))
Note i used id_ because the word id is reserved word
Here is an example for converting a dataframe with three columns A, B, and C (let's say A and B are the geographical coordinates of longitude and latitude and C the country region/state/etc., which is more or less the case).
I want a dictionary with each pair of A,B values (dictionary key) matching the value of C (dictionary value) in the corresponding row (each pair of A,B values is guaranteed to be unique due to previous filtering, but it is possible to have the same value of C for different pairs of A,B values in this context), so I would do:
mydict = dict(zip(zip(df['A'],df['B']), df['C']))
Using pandas to_dict() also works:
mydict = df.set_index(['A','B']).to_dict(orient='dict')['C']
(none of the columns A or B are used as an index before executing the line creating the dictionary)
Both approaches are fast (less than one second on a dataframe with 85k rows on a ~2015 fast dual-core laptop).
Another (slightly shorter) solution for not losing duplicate entries:
>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
id value
0 a 1
1 a 2
2 b 3
>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
... ptest_slice = ptest[ptest['id'] == i]
... pdict[i] = ptest_slice['value'].tolist()
...
>>> pdict
{'b': [3], 'a': [1, 2]}
You can also do this if you want to play around with pandas. However, I like punchagan's way.
# replicating your dataframe
lake = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'],
'area': [10, 20, 30, 40],
'count': [7, 5, 2, 3]})
lake.set_index('co tp', inplace=True)
# to get key value using pandas
area_dict = lake.set_index('area').T.to_dict('records')[0]
print(area_dict)
output: {10: 7, 20: 5, 30: 2, 40: 3}
If 'lakes' is your DataFrame, you can also do something like:
# Your dataframe
lakes = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'],
'area': [10, 20, 30, 40],
'count': [7, 5, 2, 3]})
lakes.set_index('co tp', inplace=True)
area_dict = lakes.set_index("area")["count"].to_dict()
or @punchagan 's solution (which I prefer)
area_dict = dict(zip(lakes.area, lakes.count))
Both should work.
you need this it
area_dict = lakes.to_dict(orient='records')
You need a list as a dictionary value. This code will do the trick.
from collections import defaultdict
mydict = defaultdict(list)
for k, v in zip(df.id.values,df.value.values):
mydict[k].append(v)
If you set the the index than the dictionary will result in unique key value pairs
encoder=LabelEncoder()
df['airline_enc']=encoder.fit_transform(df['airline'])
dictAirline= df[['airline_enc','airline']].set_index('airline_enc').to_dict()
Many answers here use dict(zip(...))
syntax. It's also possible without zip
.
mydict = dict(df.values) # {0.0: 10.2, 1.0: 5.7, 2.0: 7.4}
# or for faster code, convert to a list
mydict = dict(df.values.tolist()) # {0.0: 10.2, 1.0: 5.7, 2.0: 7.4}
If one column is int
and the other is float
as in the OP, then cast to object
dtype and call dict()
.
mydict = dict(df.astype('O').values) # {0: 10.2, 1: 5.7, 2: 7.4}
mydict = dict(df.astype('O').values.tolist()) # {0: 10.2, 1: 5.7, 2: 7.4}
If the index is meant to be the keys, it's even simpler.
mydict = df['value'].to_dict() # {0: 10.2, 1: 5.7, 2: 7.4}
Edit:
Same result could be reached by the following:
filter_list = df[df.Col.isin(criteria)][['Col1','Col2']].values.tolist()
Original Post:
I had a similar issue, where I was looking to filter a dataframe into a resulting list of lists.
This was my solution:
filter_df = df[df.Col.isin(criteria)][['Col1','Col2']]
filter_list = filter_df.to_dict(orient='tight')
filter_list = filter_list['data']
Result: list of lists
Source: pandas.DataFrame.to_dict
If there exists some duplicate values in the value columns and if we want to keep the duplicate values in the dictionary
below code could help
df = pd.DataFrame([['a',1],['a',2],['a',4],['b',3],['b',4],['c',5]], columns=['id', 'value'])
df.groupby('id')['value'].apply(list).to_dict()
output : {'a': [1, 2, 4], 'b': [3, 4], 'c': [5]}
def get_dict_from_pd(df, key_col, row_col):
result = dict()
for i in set(df[key_col].values):
is_i = df[key_col] == i
result[i] = list(df[is_i][row_col].values)
return result
This is my solution; a basic loop.
This is my solution:
import pandas as pd
df = pd.read_excel('dic.xlsx')
df_T = df.set_index('id').T
dic = df_T.to_dict('records')
print(dic)