Hello everyone, I'm having some issues with using pandas python library. Basically I'm reading csv file with pandas and want to remove duplicates. I've tried everything and problem is still there.
import sqlite3
import pandas as pd
import numpy
connection = sqlite3.connect("test.db")
## pandas dataframe
dataframe = pd.read_csv('Countries.csv')
##dataframe.head(3)
countries = dataframe.loc[:, ['Retailer country', 'Continent']]
countries.head(6)
Output of this will be:
Retailer country Continent
-----------------------------
0 United States North America
1 Canada North America
2 Japan Asia
3 Italy Europe
4 Canada North America
5 United States North America
6 France Europe
I want to be able to drop duplicate values based on columns from a dataframe above so I would have smth like this unique values from each country, and continent so that desired output of this will be:
Retailer country Continent
-----------------------------
0 United States North America
1 Canada North America
2 Japan Asia
3 Italy Europe
4 France Europe
I have tried some methods mentioned there: Using pandas for duplicate values and looked around the net and realized I could use df.drop_duplicates() function, but when I use the code below and df.head(3) function it displays only one row. What can I do to get those unique rows and finally loop through them ?
countries.head(4)
country = countries['Retailer country']
continent = countries['Continent']
df = pd.DataFrame({'a':[country], 'b':[continent]})
df.head(3)