0

I am text mining from documents trying to collect company names from their annual reports using BeautifulSoup in Python. Am storing the collected names in a list, but I also am collecting duplicate names. I want to remove the duplicates from the list so that I have only unique company names. The names are of 3-4 words each. I tried using set() and similar looping techniques but it gives me a list of unique characters, not names. Please suggest a way to solve this issue.

newlist = [] for i in etfname: if i not in newlist: newlist.append(i) print(newlist)Screenshot

  • 1
    What did you try with the sets? It sounds like you stepped one level too deep while iterating. – Klaus D. Sep 08 '17 at 05:07
  • I have added a screenshot showing the list i have extracted from the file. I tried using the for loop and i got a list of unique letters, not names as a whole. Same happens with sets(). Please suggest a way to solve this issue. – Shoumik Goswami Sep 14 '17 at 08:42

0 Answers0