0

I am currently running some python code to extract words from a list and create a list of these words.

The list I am using is from a .txt file with some lines from romeo and juliet.

I read in the file, trimmed the whitespace, split each word up, and added these words to a list.

I am now trying to create a list that does not include any words repeating.

I know that I need to create a loop of some sort to go through the list, add the words, and then discard the repeated words.

This is the code I currently have:

fname = input ("Enter file name: ")

#Here we check to see if the file is in the correct format
#If it is not, we will return a personalized error message
#and quit the programme.

try :
    fh = open(fname)
except : 
    print("Enter a valid file name: ")
    quit()
#Here I read in the file so that it returns as a complete 
#string.

fread = fh.read()
#print(fread)

#Here we are stripping the file of any unnecessary 
#whitespace 

fstrip = fread.rstrip()
#print(fstrip)

#Here we are splitting all the words into individual values
#in a list. This will allow us to write a loop to check
#for repeating words.

fsplit = fstrip.split()
lst = list(fsplit)
#print(lst)

#This is going to be our for loop.

wdlst = list()

Any help would be greatly appreciated, I am new to python and I just cannot seem to figure out what combination of statements needs to be added to create this new list.

Many thanks,

  • 5
    https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists ? – CoffeeTableEspresso Nov 23 '22 at 22:56
  • Please share the list information instead to reproduce the issue. Also state an example of the list as an input and output as you want. This would make it easier to reproduce the issue. – Roxy Nov 23 '22 at 23:00
  • Comments help -- to a point. Too many comments detract from readability. Also -- `lst = list(fsplit)` is pointless since `split` already returns a list. – John Coleman Nov 23 '22 at 23:02

2 Answers2

0

A set requires that it has only unique elements. To remove repeated elements from a list, you can convert it into a set and then back again.

list_without_duplicates = list(set(lst))

Here's a simple way to do it while preserving the order of the words:

new_list = []
added_words = {}
for word in lst:
    if word not in added_words:
        new_list.append(word)
        added_words.add(word)

Then, new_list will contain all words in the list except with duplicates removed.

siIverfish
  • 101
  • 3
0

You do not require a for loop to remove the duplicates of a list. Instead use set to remove the duplicates.

E.g.:

l1=['hello', 'world' , 'hello', 'people']
print(list(set(l1)))

Result:

['hello', 'world', 'people']
Roxy
  • 1,015
  • 7
  • 20