2

I have a text file with special charactes. I want to replace all the instance of enter image description here by enter image description here this is how i read my file

donnees <- read.table("Shanghaifr.txt", sep="\t", header = TRUE)

and I get this sample of my table

I tried this but it did not work

    donnees <- read.table("Shanghaifr.txt", sep="\t", header = TRUE)
    datest <- donnees$datesr[[15]]
    sub("ao<U+00FB>","ao\\U00FBt",datest)

I'm supposed to get 17août2017 for example so that i can easily do later

as.Date("17août2017", "%d%b%Y")# to get the numeric date.

  • maybe do a sub? ie `sub("","Aug",datatest$date)` then convert to data format – Onyambu Jan 26 '18 at 10:38
  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Sotos Jan 26 '18 at 10:39
  • thank you for your quick answer. I just tried it didn’t change anything . Maybe I did it wrong. – Deborah Houngbedji Jan 26 '18 at 10:57
  • This might be helpful: https://stackoverflow.com/questions/37703291/how-to-convert-special-characters-into-unicode-in-r. I would suggest re-framing the question (e.g. "converting special characters to plain text") and provide some sample data so people can try on more than just one instance e.g. other accents, chapeau etc. – Gautam Jan 26 '18 at 14:23
  • R would understand the following format / escaping: `as.Date("21ao\u00FBt2017", "%d%b%Y")` – RolandASc Jan 26 '18 at 15:47
  • @Gautam I try to edit with more details. I read the link u suggested but it didn't really help. I was not able to change the special character – Deborah Houngbedji Jan 29 '18 at 06:49
  • @RolandASc thank you. I know. But my problem is actually how to edit to \U00FB – Deborah Houngbedji Jan 29 '18 at 06:52

1 Answers1

1

Using sub seems a bit tricky here, because of the encoding it might do. E.g.:

sub("ao<U+00FB>t", "ao\u00FBt", "21ao<U+00FB>t2017", fixed = TRUE)
# [1] "21août2017"

Possible work-around could be (there must be a more elegant way?!):

sub("<U\\+00FB>", enc2native("\u00FB"), "21ao<U+00FB>t2017")
# [1] "21août2017"
RolandASc
  • 3,863
  • 1
  • 11
  • 30
  • Merciiiiii thank you.. the one with enc2native worked very well. I had first to change my local LC_CTYPE to "French_France.1252" and then everything was ok. thank u a lot – Deborah Houngbedji Jan 30 '18 at 08:20