-2

I'm try to make an inverted index for some NLP to see how many times a word appears in a document. I'm doing this via a dictionary but my output is like this (here the word man appears in documents 1 and 11)

{'man': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
 1, 1, 1, 1, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11],
 'upon': [1, 1, 1, 3, 3, 3, 1539, 1539, 1539]}

How do I get rid of these duplicate values so I just have

{'man': [1,11], 'upon': [1,3,1539]}
Shane Bishop
  • 3,905
  • 4
  • 17
  • 47
David R
  • 23
  • 5
  • Does this answer your question: https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-whilst-preserving-order? BTW perhaps the best approach is not to create these lists with duplicates in the first place. – Dani Mesejo Oct 24 '21 at 23:13

1 Answers1

2

Just convert values to sets and then back to lists:

my_dict = {k: list(set(v)) for k, v in my_dict.items()}
NotAName
  • 3,821
  • 2
  • 29
  • 44