0

I have a large dataset of flight legs that I wanna build a graph out of where the weight of the graph is the number of times a particular leg was flown. The pairs of cities involved in a leg is stored as a list of sets. I am having trouble creating a count/frequency dictionary because "sets are unhashable"

my_test_list = [{'DOHA', 'ROME'},{'DOHA', 'JAKARTA'},{'DOHA', 'JAKARTA'},{'DOHA', 'ROME'},{'MAURITIUS','ROME'},{'MAURITIUS', 'ROME'},{'DOHA', 'ROME'},{'DOHA', 'JAKARTA'},{'JAKARTA', 'ROME'}, {'DOHA', 'ROME'},{'NEW YORK   NY', 'WASHINGTON, DC'},{'ACCRA', 'WASHINGTON, DC'}]

Ideally, I would like to have an output like this that I can feed into networkx:

edge_list = [('DOHA', 'ROME', {'frequency': 4}), ('DOHA', 'JAKARTA', {'frequency': 3}),('MAURITIUS', 'ROME', {'frequency': 2}), ('ROME', 'JAKARTA', {'frequency': 1}),('NEW YORK   NY', 'WASHINGTON, DC', {'frequency': 1}),('ACCRA', 'WASHINGTON, DC', {'frequency': 1}) ]

This is what I have done and it seems ghastly.

my_concat_list=[]
for item in my_test_list:
    out=""
    while len(item) !=0:
        out=out+";"+item.pop()
    my_concat_list.append(out)

my_concat_list winds up looking like this:

 [';DOHA;ROME',
 ';JAKARTA;DOHA',
 ';JAKARTA;DOHA',
 ';DOHA;ROME',
 ';ROME;MAURITIUS',
 ';ROME;MAURITIUS',
 ';DOHA;ROME',
 ';JAKARTA;DOHA',
 ';JAKARTA;ROME',
 ';DOHA;ROME',
 ';WASHINGTON, DC;NEW YORK   NY',
 ';ACCRA;WASHINGTON, DC']

I use Counter to get the frequency.

from collections import Counter
my_out = Counter(my_concat_list)

The output I get is:

Counter({';DOHA;ROME': 4,
         ';JAKARTA;DOHA': 3,
         ';ROME;MAURITIUS': 2,
         ';JAKARTA;ROME': 1,
         ';WASHINGTON, DC;NEW YORK   NY': 1,
         ';ACCRA;WASHINGTON, DC': 1})

From here, I can get the final format I want:

my_final_list=[]
for item in my_out.keys():
    temp_list = item.split(";")
    weight = my_out[item]
    my_new_tuple = (temp_list[1],temp_list[2],{'frequency':weight})
    my_final_list.append(my_new_tuple)
my_final_list

This is what my_final_list looks like:

[('DOHA', 'ROME', {'frequency': 4}),
 ('JAKARTA', 'DOHA', {'frequency': 3}),
 ('ROME', 'MAURITIUS', {'frequency': 2}),
 ('JAKARTA', 'ROME', {'frequency': 1}),
 ('WASHINGTON, DC', 'NEW YORK   NY', {'frequency': 1}),
 ('ACCRA', 'WASHINGTON, DC', {'frequency': 1})]

But there's got to be a better way of doing this. This seems really clunky.

Amatya
  • 1,203
  • 6
  • 32
  • 52
  • 2
    Python [frozen sets](https://stackoverflow.com/questions/22359664/frozenset-example-of-when-one-might-use-them) are hashable. Since a set is unordered, why is a leg a set rather than a tuple of (from, to)? Tuples also are hashable. – DarrylG Mar 26 '21 at 01:15
  • @DarrylG I converted it into a set because I wanted (DC, NY) to be the same as (NY, DC). I guess I could create tuples instead of sets and put the tuple elements in alphabetical order at the time of creation. Secondly, I think I did try frozen sets but I had gotten the same error. Let me try again. – Amatya Mar 26 '21 at 01:31

2 Answers2

1

If you convert the sets into tuples, you can then use a Counter directly on the input data. You can then use a list comprehension to convert the Counter into the format you desire:

from collections import Counter

my_test_list = [{'DOHA', 'ROME'},{'DOHA', 'JAKARTA'},{'DOHA', 'JAKARTA'},{'DOHA', 'ROME'},{'MAURITIUS','ROME'},{'MAURITIUS', 'ROME'},{'DOHA', 'ROME'},{'DOHA', 'JAKARTA'},{'JAKARTA', 'ROME'}, {'DOHA', 'ROME'},{'NEW YORK   NY', 'WASHINGTON, DC'},{'ACCRA', 'WASHINGTON, DC'}]

counts = Counter(tuple(s) for s in my_test_list)

result = [k + ({ 'frequency' : v },) for k, v in counts.items()]
print(result)

Output:

[
 ('DOHA', 'ROME', {'frequency': 4}),
 ('DOHA', 'JAKARTA', {'frequency': 3}),
 ('ROME', 'MAURITIUS', {'frequency': 2}),
 ('ROME', 'JAKARTA', {'frequency': 1}),
 ('WASHINGTON, DC', 'NEW YORK   NY', {'frequency': 1}),
 ('WASHINGTON, DC', 'ACCRA', {'frequency': 1})
]
Nick
  • 138,499
  • 22
  • 57
  • 95
1

Use frozenset to get a hashable set.

  • Most likely you can modify the code that generates them; replacing set with frozenset will probably do the trick.

  • If you can't generate frozenset directly, you can convert them:

    my_out = Counter(frozenset(leg) for leg in my_concat_list)
    
Jiří Baum
  • 6,697
  • 2
  • 17
  • 17