1

So, I have a dictionary like this:

dic_parsed_sentences = {'religion': {'david': 1, 'joslin': 1, 'apolog': 5, 'jim': 1, 'meritt': 2}, 
 'sport': {'sari': 1, 'basebal': 1, 'kolang': 5, 'footbal': 1, 'baba': 2},
 'education': {'madrese': 1, 'kelas': 1, 'yahyah': 5, 'dars': 1},
 'computer': {'net': 1, 'internet': 1},
 'windows': {'copy': 1, 'right': 1}}

I want to loop through it based on the length of the dictionaries within that dictionary.

For example,
it has two items with length 5, one item with length 4, and two items with length 2. I want to process the same length items together (something like a group by in pandas).
So the output of the first iteration will look like this (as you see only items with length 5 are available here):

[[david, joslin, apolog, jim, meritt],
 [sari, baseball, kolang, footbal, baba]]

and next iteration it will make the next same length items:

[[madrese, kelas, yahyah, dars]]

And the last iteration:

[[net, internet],
 [copy, right]]

Why do we only have three iterations here? Because we only have three different lengths of items within the dictionary dic_parsed_sentences. I have done something like this, but I dont know how to iterate through the same length items:

for i in dic_parsed_sentences.groupby(dic_parsed_sentences.same_length_items): # this line is sodoku line I dont know how to code it(I mean iterate through same length items in the dicts)
    for index_file in dic_parsed_sentences:
        temp_sentence = dic_parsed_sentences[index_file]
        keys_words = list(temp_sentence.keys())
        for index_word in range(len(keys_words)):
            arr_sent_wids[index_sentence, index_word] = 
                                keys_words[index_word]
    index = index + 1
    index_sentence = index_sentence + 1

Update:

for length, dics in itertools.groupby(dic_parsed_sentences, len):
    for index_file in dics:
        temp_sentence = dics[index_file]
        keys_words = list(temp_sentence.keys())
        for index_word in range(len(keys_words)):
                test_sent_wids[index_sentence, index_word] = lookup_word2id(keys_words[index_word])
        index = index + 1
        index_sentence = index_sentence + 1
Akaisteph7
  • 5,034
  • 2
  • 20
  • 43
sariii
  • 2,020
  • 6
  • 29
  • 57
  • 1
    What do you mean by "length of an item"? – sobek Aug 09 '19 at 22:34
  • I mean the length of items inside the second dictionary. so there are two dic within the outside dic that has length 5 – sariii Aug 09 '19 at 22:36
  • @sobek I also included and example, please see my update – sariii Aug 09 '19 at 22:39
  • 1
    Please make a [mcve] including expected output and valid code. (`for in` is invalid, for example) – wjandrea Aug 09 '19 at 22:46
  • @wjandrea sure I make a minimal reproducible example. Actually, I know it is incorrect, that part is the part I don't know how to program it that's why I just put as sudoku what I mean. sure I will update though – sariii Aug 09 '19 at 22:48

3 Answers3

1

You can use itertools.groupby after sorting the dictionary elements by length.

import itertools
items = sorted(dic_parsed_sentences.values(), key = len, reverse = True)
for length, dics in itertools.groupby(items, len):
    # dics is all the nested dictionaries with this length
    for temp_sentence in dics:
        keys_words = list(temp_sentence.keys())
        for index_word in range(len(keys_words)):
                test_sent_wids[index_sentence, index_word] = lookup_word2id(keys_words[index_word])
        index = index + 1
        index_sentence = index_sentence + 1     
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thank you so much for your help. I have little problem get it to work. it says `TypeError: 'itertools._grouper' object is not subscriptable` – sariii Aug 09 '19 at 23:03
  • `dics` is not a dictionary. Just use `for temp_sentence in dics:` – Barmar Aug 09 '19 at 23:07
  • Actually I run the same script as you, and got this `'str' object has no attribute 'keys'` I think there is no `keys` in `the temp_sentence` – sariii Aug 09 '19 at 23:12
  • Needed to use `dic_parsed_sentences.values()` to get the values rather than the keys. – Barmar Aug 09 '19 at 23:21
  • @Barmer Thank you so much – sariii Aug 09 '19 at 23:27
  • one last question , Is there any way I can get the number of items inside `dics` ? – sariii Aug 09 '19 at 23:46
  • I don't think so. It's an iterator, so you can't call `len(dics)`. – Barmar Aug 09 '19 at 23:50
  • not a good way but at least give the number : `for x in dics: a.append(x) a = np.array(a) print(a.shape[0])` – sariii Aug 10 '19 at 00:13
  • @sariii But that will use up the `dics` iterator so `for temp_sentence in dics:` won't work. – Barmar Aug 10 '19 at 00:14
  • 1
    can't you just increment a counter in the `for` loop? – Barmar Aug 10 '19 at 00:15
  • Yea :)) I think that works as well. thinking complicated – sariii Aug 10 '19 at 00:21
  • it is among the weirdest thing I've seen! After adding for loop for keeping the count, it never reach the `key_words` in the last loop. It seems like it reaches the end of dic not like its a variable!!! – sariii Aug 10 '19 at 23:15
  • @sariii You shouldn't add a new `for` loop, you should increment the count in the same loop. You can't use an iterator twice. – Barmar Aug 12 '19 at 04:13
  • See https://stackoverflow.com/questions/10866134/how-to-prevent-iterator-getting-exhausted-in-python3-x – Barmar Aug 12 '19 at 04:14
1
bylen = {}
for v in dic_parsed_sentences.values():
    l = len(v)
    if not l in bylen:
        bylen[l] = []
    bylen[l].append(list(v.keys()))

for k in reversed(sorted(bylen.keys())):
    # use bylen[k]
krisz
  • 2,686
  • 2
  • 11
  • 18
1

You can do it using the following method:

finds = [[key, len(dic_parsed_sentences[key])] for key in dic_parsed_sentences]
finds.sort(reverse=True, key=lambda x: x[1])

previous = finds[0][1]
res = []
for elem in finds:
    current = elem[1]
    if current != previous:
        previous = current
        print(res)
        res = []
    res.append(list(dic_parsed_sentences[elem[0]]))
print(res)
Akaisteph7
  • 5,034
  • 2
  • 20
  • 43