I am using the as.Date
function as follows:
x$time_date <- as.Date(x$time_date, format = "%H:%M - %d %b %Y")
This worked fine until I saw a lot of NA
values in the output, which I traced back to some of the dates stemming from a different language: German.
My English dates look like this: 18:00 - 10 Dec 2014
Where the German equivalent is: 18:00 - 10 Dez 2014
The month December is abbreviated the German way. This is not recognised by the as.Date
function. I have the same problem for five other months:
Mar - März
May - Mai
Jun - Juni
Jul - Juli
Oct - Okt
This looks like it would be of use, but I am unsure of how to implement it for 'unrecognised' formats: How to change multiple Date formats in same column
I attempted to just go through and use gsub to replace all the occurences of German months, but without luck. x
below is the data.table and I work on just the time_date column:
x$time_date <- gsub("(März)?", "Mar", x$time_date) %>%
gsub("(Mai)?", "May", .) %>%
gsub("(Juni)?", "Jun", .) %>%
gsub("(Juli)?", "Jul", .) %>%
gsub("(Okt)?", "Oct", .) %>%
gsub("(Dez)?", "Dec", .)
Not only did this not work, but it is also a very slow process and I have nearly 20 GB of pure .csv files to work through.
In the as.Date documentation there is mention of different locales / languages, but not how to work with several simultaneously. I also found instructions on how to use different languages, however my data is all mixed, so I can only thing of a conditional loop using the correct language for each file, however that would also be slow.
Is there a known workaround for this, which I can't find?