2

I'd like to read an excel table with Panda, and create a list of tuples. Then, I want to convert the list into a dictionary which has a tuple as key. How can I do that?

Here is the table that I am reading;

A B 0.6

A C 0.7

C D 1.0

C A 1.2

D B 0.7

D C 0.6

Here is how I read my table;

import pandas as pd

df= pd.read_csv("my_file_name.csv", header= None)  

my_tuple = [tuple(x) for x in df.values]

Now, I want to have the following structure.

my_data =  {("A", "B"): 0.6,

            ("A", "C"): 0.7,

            ("C", "D"): 1,

            ("C", "A"): 1.2,

            ("D", "B"): 0.7,

            ("D", "C"): 0.6}
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
sergey_208
  • 614
  • 3
  • 21
  • 1
    See the relevant post by Martjin on [dictionary ordering](https://stackoverflow.com/a/39537308/4333359) and why you're seeing a different ordering in Python 3.5 – ALollz Jan 14 '19 at 21:13

5 Answers5

3

Set_index and to_dict

df.set_index(['a', 'b']).c.to_dict()

{('A', 'B'): 0.6,
 ('A', 'C'): 0.7,
 ('C', 'A'): 1.2,
 ('C', 'D'): 1.0,
 ('D', 'B'): 0.7,
 ('D', 'C'): 0.6}

Option2: Another solution using zip

dict(zip(df[['A', 'B']].apply(tuple, 1), df['C']))

Option 3:

k = df[['A', 'B']].to_records(index=False).tolist()
dict(zip(k, df['C']))
Vaishali
  • 37,545
  • 5
  • 58
  • 86
  • Thanks for the answer. So I updated the way I read the table as df= pd.read_csv("my_file_name.csv", header= None, names= ["A","B","C"]) . Then, I call the following function print(df.set_index(['A', 'B']).C.to_dict()). My output looks like {('D', 'C'): 0.6, ('A', 'C'): 0.7, ('D', 'B'): 0.7, ('C', 'A'): 1.2, ('A', 'B'): 0.6, ('C', 'D'): 1.0}. Do you know why the order is changed? – sergey_208 Jan 14 '19 at 20:49
  • Can you update your post with df.head() after you read your data with column names? – Vaishali Jan 14 '19 at 20:54
  • So I added df.head() but now I am getting the following order {('C', 'A'): 1.2, ('D', 'B'): 0.7, ('A', 'C'): 0.7, ('D', 'C'): 0.6, ('C', 'D'): 1.0, ('A', 'B'): 0.6}. What are the lower case letters that you used in your answer? Those are also column names, aren't they? – sergey_208 Jan 14 '19 at 21:00
  • @ball_jan what version of python are you using? – ALollz Jan 14 '19 at 21:03
  • a, b, and c are column names. With your code, df.set_index(['A', 'B']).C.to_dict() – Vaishali Jan 14 '19 at 21:06
  • I am using Python 3.5. All the solution methods recommended here seem like working fine but the order changes in all the cases. – sergey_208 Jan 14 '19 at 21:07
  • 1
    Python dictionary is inherently unordered. Also if you have another instance of same combinations of columns, the second one will supercede the first to replace the value – Vaishali Jan 14 '19 at 21:10
  • Vaishali, I really appreciate your help and answers. Can I use the column index rather than the name to do the same operation? – sergey_208 Jan 14 '19 at 21:13
  • You can try another solution I posted. It works index-wise – Vaishali Jan 14 '19 at 21:22
1

Jan - here's one idea: just create a key column using the pandas apply function to generate a tuple of your first 2 columns, then zip them up to a dict.

import pandas as pd
df = pd.read_clipboard()
df.columns = ['first', 'second', 'value']
df.head()

def create_key(row):
    return (row['first'], row['second'])

df['key'] = df.apply(create_key, axis=1)

dict(zip(df['key'], df['value']))

{('A', 'C'): 0.7,
 ('C', 'A'): 1.2,
 ('C', 'D'): 1.0,
 ('D', 'B'): 0.7,
 ('D', 'C'): 0.6}
jrjames83
  • 901
  • 2
  • 9
  • 22
1

This is less concise than @Vaishali's answer but gives you more of an idea of the steps.

vals1 = df['A'].values
vals2 = df['B'].values
vals3 = df['C'].values

dd = {}
for i in range(len(vals1)):
    key = (vals1[i], vals2[i])
    value = vals3[i]
    dd[key] = value

{('A', 'B'): '0.6',
('A', 'C'): '0.7',
('C', 'D'): '1.0',
('C', 'A'): '1.2',
('D', 'B'): '0.7',
('D', 'C'): '0.6'}
m13op22
  • 2,168
  • 2
  • 16
  • 35
1

If you would use simple code:

this one would not use any importing something like panda :

def change_csv(filename):
    file_pointer = open(filename, 'r')
    data = file_pointer.readlines()
    dict = {}
    file_pointer.close()
    for each_line in data:
        a, b, c =  each_line.strip().split(" ")
        dict[a, b] = c
    return dict

so out put of this yours.

and out put is :

{('A', 'B'): '0.6', ('A', 'C'): '0.7', ('C', 'D'): '1.0', ('C', 'A'): '1.2', ('D', 'B'): '0.7', ('D', 'C'): '0.6'}
1

A comprehension will work well for smaller frames:

dict((tuple((a, b)), c) for a,b,c in df.values)
#{('A', 'B'): 0.6,
# ('A', 'C'): 0.7,
# ('C', 'A'): 1.2,
# ('C', 'D'): 1.0,
# ('D', 'B'): 0.7,
# ('D', 'C'): 0.6}

If having issues with ordering:

from collections import OrderedDict

d = OrderedDict((tuple((a, b)), c) for a,b,c in df.values)
#OrderedDict([(('A', 'B'), 0.6),
#             (('A', 'C'), 0.7),
#             (('C', 'D'), 1.0),
#             (('C', 'A'), 1.2),
#             (('D', 'B'), 0.7),
#             (('D', 'C'), 0.6)])
ALollz
  • 57,915
  • 7
  • 66
  • 89