I can't use nltk because of download issues at work I wanted to create an function that removes stopwords(dutch). I have an text file with dutch stopwords, and i want to read in and use to find stopwords in en pandas dataframe. I saved the datafile as an txt. file, but i get duplicates. Could someone help me with this issues, i wrote the function below.
import pandas as pd
import numpy as np
import re
from nltk.tokenize import word_tokenize
dictionary = {'í':'i','á':'a','ö': 'o','ë':'e'}
pd.set_option('display.max_colwidt',-1)
df = pd.read_csv('Map1.csv', error_bad_lines=False, encoding='latin1')
df.replace(dictionary, regex=True, inplace=True)
# I want to remove it from df['omschrijving skill']
stopwords =['de','Een','van','ik','te','dat','die','in','een','hij','het','niet','zijn','is','was','of','aan']
querywords = query.split()
resultwords = [word for word in querywords if word.lower() not in stopwords]
result = ' '.join(resultwords)
print(result)