I have an issue with UTF-8 coding in a huge dataframe (millions of rows). I used this question, but I did not fix the issue.
My column (character) is very simple:
Start date
12/01/2019
12/01/2019
12/02/2019
I am trying to convert into date
taxi_2020_test$`Start Date` <- mdy(taxi_2020_test$`Start Date`)
and get this
Error in gsub(reg$alpha_exact[["A"]], "%A", x, ignore.case = T, perl = T) : input string 1 is invalid UTF-8
It is 100% an issue with UTF-8, because in Python I cannot even import this dataset into Jupyter, it gives me an error, again mentioning UTF-8.
How to fix or at least to drop this? I have millions of rows and if it is a small number of bad rows, I am ok with it.