Generate string set from csv file in Python

Question

Please don't flag my answer instantaniously, because I searched several other questions that didn't solve my problem, like this.

I'm trying to generate a python set of strings from a csv file. The printed pandas dataframe of the loaded csv file has the following structure:

   0
0  me
1  yes
2  it

For a project I need this to be formatted to look like this

STOPWORDS = {'me', 'yes', 'it'}

I tried to do this by the following code.

import pandas as pd

df_stopwords = pd.read_csv("C:/Users/Jakob/stopwords.csv", encoding = 'iso8859-15', header=-1)

STOPWORDS = {}
for index, row in df_stopwords.iterrows():
    STOPWORDS.update(str(row))

print(STOPWORDS)

However, I get this error:

dictionary update sequence element #0 has length 1; 2 is required

When I use the STOPWORDS.update(str(row)) I get the this error:

'dict' object has no attribute 'add'

Thank you all in advance!

@nixon please post this as an answer. I want to give you the credit for this! Solved my problem. At all: Please upvote! ;-P — Mike_H, Dec 13 '18 at 19:08
@YOLO Your solution worked perfectly as well. I want to accept both your answers! :D Big Thank You! — Mike_H, Dec 13 '18 at 19:09
the reason why your original code did not work: `STOPWORDS = {}` initializes a dictionary. What you want is: `STOPWORDS=set()` — Ricky Kim, Dec 13 '18 at 19:12

score 3 · Accepted Answer · answered Dec 13 '18 at 19:18

3

You can directly create a set from the values in the dataframe with:

set(df.values.ravel())
{'me', 'yes', 'it'}

answered Dec 13 '18 at 19:18

yatu

86,083
12
84
139

Jaden Baptista · Answer 2 · 2018-12-13T19:16:31.117

1

A dictionary is a mapping of keys and values. Like an object in many other languages. Since you need it as a set, define it as a set. Don't change it to a set later.

import pandas as pd

df_stopwords = pd.read_csv("C:/Users/Jakob/stopwords.csv", encoding = 'iso8859-15', header=-1)

STOPWORDS = set()
for index, row in df_stopwords.iterrows():
    STOPWORDS.add(str(row))

print(STOPWORDS)

edited Dec 13 '18 at 19:16

answered Dec 13 '18 at 19:09

Jaden Baptista

656
5
16

I need the curly brackets. That is the reason why I can't use a list. Thank you! – Mike_H Dec 13 '18 at 19:11
Why would you need the curly brackets? Python only uses them for specific things. – Jaden Baptista Dec 13 '18 at 19:12
I use this set as an input to the wordcloud library that requires a set of words that shouldn't be shown in the word cloud. It is not accepting lists. – Mike_H Dec 13 '18 at 19:13
1

If you are formatting it into a string, you could do this: `with_curly_brackets = str(STOPWORDS).replace("[", "{").replace("]", "}")` – Jaden Baptista Dec 13 '18 at 19:14
1

Ah... I see. I'm updating my answer with the more Pythonic way to do it. – Jaden Baptista Dec 13 '18 at 19:15

score 1 · Answer 3 · answered Dec 13 '18 at 19:20

1

It looks like you need to convert the values in your column as a list and then use the list as your stop words.

stopwords = df_stopwords['0'].tolist()
--> ['me', 'yes', 'it']

answered Dec 13 '18 at 19:20

cyrus24

353
3
9

score 1 · Answer 4 · answered Dec 13 '18 at 19:28

1

As mentioned in the accepted answer here. You might wanna use itertuples() since it is faster.

STOPWORDS = set()
for index, row in df_stopwords.itertuples():
    STOPWORDS.add(row)

print(STOPWORDS)

answered Dec 13 '18 at 19:28

Rabin Lama Dong

2,422
1
27
33

Generate string set from csv file in Python

4 Answers4