0

I have scraped data from a Danish newspaper and the dates are like this:

"08. Maj 2012"

This is in character class and I wanna as data class.

I tried as.Date(dates, "%d. %b %Y")

and I got:

Error in as.Date.default(allarticles.dr, "%d. %b %Y") : do not know how to convert 'allarticles.dr' to class “Date”

How can I do? I need to transform character to date but it is not recognizing in the normal way.

I also tried

Sys.setlocale("LC_TIME", "da_DK.UTF-8") as.Date(dates, "%d. %b %Y)

and I am getting a lot of NAs

When applying dputthese are a sample of NAs that appear:

"10. Feb. 2018", "13. Feb. 2018", "18. Feb. 2018", "21. Feb. 2018", "27. Feb. 2018", "01. Mar. 2018", "01. Mar. 2018", "09. Mar. 2018", "14. Mar. 2018", "24. Mar. 2018", "26. Mar. 2018", "07. Apr. 2018", "12. Apr. 2018", "15. Apr. 2018", "28. Apr. 2018", "04. Jun. 2018", "05. Jun. 2018", "05. Jun. 2018", "12. Jun. 2018", "14. Jun. 2018", "16. Jun. 2018", "17. Jun. 2018", "19. Jun. 2018", "21. Jun. 2018", "29. Jun. 2018", "12. Jul. 2018", "13. Jul. 2018", "15. Jul. 2018", "22. Jul. 2018", "07. Aug. 2018", "08. Aug. 2018", "20. Aug. 2018", "21. Aug. 2018", "25. Aug. 2018", "28. Aug. 2018", "31. Aug. 2018", "31. Aug. 2018", "02. Sep. 2018", "02. Sep. 2018", "06. Sep. 2018", "20. Sep. 2018", "27. Sep. 2018", "01. Okt. 2018", "06. Okt. 2018", "09. Okt. 2018", "11. Okt. 2018", "13. Okt. 2018", "13. Okt. 2018", "13. Okt. 2018", "13. Okt. 2018", "15. Okt. 2018", "17. Okt. 2018", "18. Okt. 2018", "18. Okt. 2018", "18. Okt. 2018", "20. Okt. 2018", "22. Okt. 2018", "23. Okt. 2018", "24. Okt. 2018", "27. Okt. 2018", "27. Okt. 2018", "27. Okt. 2018", "27. Okt. 2018", "29. Okt. 2018", "08. Nov. 2018", "08. Nov. 2018", "08. Nov. 2018", "08. Nov. 2018", "13. Nov. 2018", "15. Nov. 2018", "16. Nov. 2018", "27. Nov. 2018", "27. Nov. 2018", "28. Nov. 2018", "29. Nov. 2018", "02. Dec. 2018", "05. Dec. 2018", "05. Dec. 2018", "05. Dec. 2018", "06. Dec. 2018", "07. Dec. 2018", "08. Dec. 2018", "12. Dec. 2018", "13. Dec. 2018", "19. Dec. 2018", "20. Dec. 2018", "01. Jan. 2019", "06. Jan. 2019", "04. Feb. 2019", "06. Feb. 2019", "07. Feb. 2019", "18. Feb. 2019", "21. Feb. 2019", "07. Mar. 2019", "21. Mar. 2019", "27. Mar. 2019", "28. Mar. 2019"

Maria
  • 13
  • 8
  • 1
    Please check the system locale and set it to relevant language. This [link](https://stackoverflow.com/questions/16347731/how-to-change-the-locale-of-r) may help you – akrun May 27 '19 at 14:13
  • Hi. I changed through "Sys.setlocale("LC_TIME", "da_DK.UTF-8")" but I am getting NAs in some dates yet – Maria May 27 '19 at 14:23
  • Please post the dates where it is wrong, preferably in `dput` format. Try `dput(dates[is.na(as.Date(dates, "%d. %b %Y))])`. – Rui Barradas May 27 '19 at 14:44
  • I got the dates in dput format. How do I add them to the rest? I got them in a list – Maria May 27 '19 at 14:50
  • It is pretty much all dates that I am getting NAs, pretty much 90% from my dataset – Maria May 27 '19 at 14:53

1 Answers1

2

Assuming Windows, set it to Danish, perform the operations and then set it back.

Sys.setlocale("LC_TIME", "Danish")

date <- c("08. Maj 2012", "09. Okt 2012")
fmt <- "%d. %b %Y"
as.Date(date, fmt)
## [1] "2012-05-08" "2012-10-09"

Sys.setlocale("LC_TIME")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • It does not work. I am still getting NAs with this option. – Maria May 27 '19 at 14:52
  • I am not using windows and the output you gave me is correct. However, to other dates from my dataframe I still get NAs. I am using Mac – Maria May 27 '19 at 14:56
  • May is the only month that works like this. all the other dates I get NA :( – Maria May 27 '19 at 14:57
  • Hey! It worked the way you said the only thing I had to do was to create 2 formats: "%d. %b %Y" and "%d. %b. %Y" → the difference is the small dot after %b. Is there a way I can combine both formats into one expression? Thansk! – Maria May 27 '19 at 15:04
  • I can' t test it out since it works for me on Windows but perhaps something like this: `d1 <- as.Date(dates, format = fmt1); d2 <- as.Date(dates, format = fmt2); d1[is.na(d1)] <- d2[is.na(d1)]` – G. Grothendieck May 27 '19 at 15:08