Assuming you are only interested in a list of unique words where order does not matter:
# Option A1
import csv
with open("adj.csv", "r") as f:
seen = set()
reader = csv.reader(f)
for line in reader:
for word in line:
seen.add(word)
list(seen)
# ['cheerful', 'colorful', 'horrible', 'happy', 'sad']
More concisely:
# Option A2
with open("adj.csv", "r") as f:
reader = csv.reader(f)
unique_words = {word for line in reader for word in line}
list(unique_words)
The with
statement safely opens and closes the file. We are simply adding every word to a set. We cast the filtered result to list()
and get a list of unique (unordered) words.
Alternatives
If ordered does matter, implement the unique_everseen
itertools recipe.
From itertools recipes:
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in it.filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
You can manually implement this or install a third-library that implements it for you, such as more_itertools
, e.g. pip install more_itertools
# Option B
import csv
import more_itertools as mit
with open("adj.csv", "r") as f:
reader = csv.reader(f)
words = (word for line in reader for word in line)
unique_words = list(mit.unique_everseen(words))
unique_words
# ['happy', 'sad', 'colorful', 'horrible', 'cheerful']