1

Please don't flag my answer instantaniously, because I searched several other questions that didn't solve my problem, like this.

I'm trying to generate a python set of strings from a csv file. The printed pandas dataframe of the loaded csv file has the following structure:

   0
0  me
1  yes
2  it

For a project I need this to be formatted to look like this

STOPWORDS = {'me', 'yes', 'it'}

I tried to do this by the following code.

import pandas as pd

df_stopwords = pd.read_csv("C:/Users/Jakob/stopwords.csv", encoding = 'iso8859-15', header=-1)

STOPWORDS = {}
for index, row in df_stopwords.iterrows():
    STOPWORDS.update(str(row))

print(STOPWORDS)

However, I get this error:

dictionary update sequence element #0 has length 1; 2 is required

When I use the STOPWORDS.update(str(row)) I get the this error:

'dict' object has no attribute 'add'

Thank you all in advance!

Mike_H
  • 1,343
  • 1
  • 14
  • 31

4 Answers4

3

You can directly create a set from the values in the dataframe with:

set(df.values.ravel())
{'me', 'yes', 'it'}
yatu
  • 86,083
  • 12
  • 84
  • 139
1

A dictionary is a mapping of keys and values. Like an object in many other languages. Since you need it as a set, define it as a set. Don't change it to a set later.

import pandas as pd

df_stopwords = pd.read_csv("C:/Users/Jakob/stopwords.csv", encoding = 'iso8859-15', header=-1)

STOPWORDS = set()
for index, row in df_stopwords.iterrows():
    STOPWORDS.add(str(row))

print(STOPWORDS)
Jaden Baptista
  • 656
  • 5
  • 16
1

It looks like you need to convert the values in your column as a list and then use the list as your stop words.

stopwords = df_stopwords['0'].tolist()
--> ['me', 'yes', 'it']
cyrus24
  • 353
  • 3
  • 9
1

As mentioned in the accepted answer here. You might wanna use itertuples() since it is faster.

STOPWORDS = set()
for index, row in df_stopwords.itertuples():
    STOPWORDS.add(row)

print(STOPWORDS)
Rabin Lama Dong
  • 2,422
  • 1
  • 27
  • 33