Extracting data from csv

Question

I have a csv file with each row containing lists of adjectives.

For example, the first 2 rows are as follows:

["happy","sad","colorful"]
["horrible","sad","cheerful","happy"]

I want to extract all the data from this file to get a list containing each adjective only one. (Here, it would be a list as follows :

["happy","sad","colorful","horrible","cheerful"]

I am doing this using Python.

import csv
with open('adj.csv', 'rb') as f: 
    reader = csv.reader(f) 
    adj_list = list(reader) 
    filtered_list = [] 
    for l in adj_list: 
        if l not in new_list: 
            filtered_list.append(l)

Possible duplicate of [Combining two lists and removing duplicates, without removing duplicates in original list](https://stackoverflow.com/questions/1319338/combining-two-lists-and-removing-duplicates-without-removing-duplicates-in-orig) — OneCricketeer, Sep 01 '17 at 03:05
import csv with open('adj.csv', 'rb') as f: reader = csv.reader(f) adj_list = list(reader) filtered_list = [] for l in adj_list: if l not in new_list: filtered_list.append(l) — floralmural, Sep 01 '17 at 03:12
You can't just `list(reader)`. That gives you a list of lists. Youll need to extract out each row in a loop, then put all columns in one list — OneCricketeer, Sep 01 '17 at 03:16
Every row has different number of elements. How would you suggest i go about it? — floralmural, Sep 01 '17 at 03:48

score 1 · Answer 1 · 2017-09-01T03:51:51.527

1

Supposing that "memory is not important" and that one liner is what you are looking for:

from itertools import chain
from csv import reader

print(list(set(chain(*reader(open('file.csv'))))))

having 'file.csv' content like this:

happy, sad, colorful
horrible, sad, cheerful, happy

OUTPUT:

['horrible', ' colorful', ' sad', ' cheerful', ' happy', 'happy']

You can remove the list() part if you don't mind receive a Python set instead of a list.

edited Sep 01 '17 at 03:51

answered Sep 01 '17 at 03:16

This too gives a set of lists. I want just a single list with no word repetition. – floralmural Sep 01 '17 at 03:47
@Newbie No set of list, just a set without any string repeat. If want a list and not a set, just add list(). I will edit to clarify. – Sep 01 '17 at 03:50

pylang · Answer 2 · 2017-09-01T04:49:49.047

Assuming you are only interested in a list of unique words where order does not matter:

# Option A1
import csv


with open("adj.csv", "r") as f:
    seen = set()
    reader = csv.reader(f)
    for line in reader:
        for word in line:
            seen.add(word)
list(seen)
# ['cheerful', 'colorful', 'horrible', 'happy', 'sad']

More concisely:

# Option A2
with open("adj.csv", "r") as f:
    reader = csv.reader(f)
    unique_words = {word for line in reader for word in line}

list(unique_words)

The with statement safely opens and closes the file. We are simply adding every word to a set. We cast the filtered result to list() and get a list of unique (unordered) words.

Alternatives

If ordered does matter, implement the unique_everseen itertools recipe.

From itertools recipes:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in it.filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

You can manually implement this or install a third-library that implements it for you, such as more_itertools, e.g. pip install more_itertools

# Option B
import csv

import more_itertools as mit


with open("adj.csv", "r") as f:
    reader = csv.reader(f)
    words = (word for line in reader for word in line)
    unique_words = list(mit.unique_everseen(words))

unique_words
# ['happy', 'sad', 'colorful', 'horrible', 'cheerful']

Extracting data from csv

2 Answers2