0

I am using python, but I got a problem.

Ideally, I would like to have no duplicates, but if I make a csv file, the same words will be output. How can I avoid duplication? I am a beginner in programming, so please be gentle with me. Thanks.

enter image description here

Here is my code.

import requests
from bs4 import BeautifulSoup
import pandas as pd
from tqdm import tqdm
from csv import writer

all_data = []
meanings = []
words = []

while True:

    spell = input("spell: ")
    r = requests.get(
        "http://www.urbandictionary.com/define.php?term={}".format(spell))
    r.encoding = r.apparent_encoding
    data = BeautifulSoup(r.content, features="lxml")
    explanation_list = data.find("div", attrs={"class": "meaning"})
    explanation_list = explanation_list.get_text()
    print(explanation_list)
    
    meanings.append(explanation_list)
    words.append(spell)
    all_data = ({'words': words, 'meanings': meanings})

    df = pd.DataFrame(data=all_data)
    filepath = 'C:/Users/dict1.csv'
    df.to_csv(filepath, mode='a', index=False, header=None)

Takahiro
  • 3
  • 3
  • In your `while` loop you append words to your `words` list and you then use `mode='a'` (=append) when saving your `DataFrame` to `csv` file. You might want to change either of both. – marcel h Oct 02 '21 at 07:14
  • so, what should I do exactly? if I change mode='a' to mode = 'w', there is no duplicate obviously, but the thing is I wanna append all_data every time I run the code. @marcel – Takahiro Oct 02 '21 at 14:52

1 Answers1

0

When you create the data frame using,

df = pd.DataFrame(data=all_data)

you can check whether there are duplicates in the data frame by using,

df.duplicated()

If there are duplicates, you can remove them by using

df.drop_duplicates(subset=['name1','name2'])
Nipuna Upeksha
  • 348
  • 3
  • 15
  • I think the problem is caused by the last line df.to_csv. – Takahiro Oct 02 '21 at 14:48
  • ``` df = pd.DataFrame(data=all_data) df.duplicated() df.drop_duplicates(subset=['words','meanings']) filepath = 'C:/Users/dict1.csv' df.to_csv(filepath, mode='w', index=False, header=None) ``` I did the above code, but I wont change. – Takahiro Oct 02 '21 at 14:54
  • Try header=False or can you paste the error you are getting if you are getting any? – Nipuna Upeksha Oct 04 '21 at 07:15