I have tried to encode the Unicode characters in the file that I am passing into the pandas dataframe. But the number of unique row counts with df.column.value_counts() that I am getting in Jupyter notebook is not matching the excel row counts of the same file(after removing duplicate values).
How do I fix the issue?
I have loaded a text file(tab separated) and converted that into a pandas dataframe using encoding = 'ISO-8859-1'. The dataframe was created with unique row counts as 66370 for one of the columns.
When I applied 'Remove duplicates' on the desired column on the original csv file(I was using MS Excel to read the export file), the number of unique values = 66368.
There is a difference of 2 in these 2 files- the pandas dataframe in Jupyter Notebook - pandas unique row counts(66370) and the excel version of the row counts(66368).
I understand this could be an encoding issue but I am not able to fix the same.
Can anyone help please?
df = pd.read_csv('csv_file.csv', encoding= 'ISO-8859-1')
df.column1.value_counts()
I am expecting equal results in the excel version of unique row_counts and df.column1.value_counts().
Actual results are showing a difference of 2 in the row counts by these 2 methods.