0

I have an issue with UTF-8 coding in a huge dataframe (millions of rows). I used this question, but I did not fix the issue.

My column (character) is very simple:

Start date
12/01/2019
12/01/2019
12/02/2019

I am trying to convert into date

taxi_2020_test$`Start Date` <- mdy(taxi_2020_test$`Start Date`)

and get this

Error in gsub(reg$alpha_exact[["A"]], "%A", x, ignore.case = T, perl = T) : input string 1 is invalid UTF-8

It is 100% an issue with UTF-8, because in Python I cannot even import this dataset into Jupyter, it gives me an error, again mentioning UTF-8.

How to fix or at least to drop this? I have millions of rows and if it is a small number of bad rows, I am ok with it.

Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63

0 Answers0