8

Edit: Thanks to R Yoda, I was finally able to create a reproducible example to the issue I am facing:

x = rawToChar(as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32)))
trimws(x)

=> Question: How can I trim x?

Old text of the question:
Please see attached screenshot. Unfortunately I am not able to create reproducible example as dput is affecting the result...

As anyone an idea how to investigate what's going wrong with x? The leading whitespace doesn't seem to be a standard one!

enter image description here

charToRaw(x) gives a0 31 31 2e 31 33 32 35 39 32
dput(charToRaw(x)) gives as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32))
Encoding(x) gives "unknown" (same as Encoding(" 11.132592"))

RockScience
  • 17,932
  • 26
  • 89
  • 125
  • 1
    Not able to reproduce the problem `x <- " 11.132592"; trimws(x)# [1] "11.132592"` in `R 3.4.0` – akrun Jul 12 '17 at 07:02
  • @akrun I know, frustrating! I have the same as you in my R `3.3.2`. How can I investigate why is x different from its value in `dput` ? – RockScience Jul 12 '17 at 07:06
  • so which R version is having this problem – akrun Jul 12 '17 at 07:07
  • What is the encoding of x? – Hong Ooi Jul 12 '17 at 07:10
  • Please include the output of `dput(x)` in your question. That will make it easier for others to help you. – Jaap Jul 12 '17 at 07:11
  • 1
    @akrun, R 3.2.2 for sure. But your code doesn't reproduce the bug. Check my screenshot: x ``gives`` "11.132592", but is now defined like this. – RockScience Jul 12 '17 at 07:12
  • @Jaap I know this link to reproducible code. Please check the content of my screenshot. `dput(x)` gives " 11.132592". But this bug is not reproducible with `dput` because `trimws(x)` and `trimws(" 11.132592")` return different values.... – RockScience Jul 12 '17 at 07:14
  • 1
    Could be an encoding problem... If you cannot use `dput` (why?) please post at least the output of `charToRaw(x)` and `Encoding(x)` here maybe this shows the reason... – R Yoda Jul 12 '17 at 07:16
  • Possibly related: https://stackoverflow.com/questions/28433056/remove-non-printable-white-spaces-from-unknown-to-me-encoding – Jaap Jul 12 '17 at 07:17
  • 2
    `charToRaw(' ')` results in `20` while the first character of `x` is `a0`; it is probably therefore not recognized as a space – Jaap Jul 12 '17 at 07:22
  • @Jaap how can I transform `a0` into `20` ? or trim `20` characters? – RockScience Jul 12 '17 at 07:24

2 Answers2

13

0xa0 is encoding another type of space (the non-breaking space) in R, while 0x20 is the white space.
trimws searches for white spaces or tabs or linebreaks or carriage returns (represented by [ \t\r\n]+) but not for non-breaking spaces, hence it does not work.
You can use sub (to suppress either leading or trailing spaces) or gsub (to suppress both trailing and leading spaces) to remove any kind of trailing or leading space(s) (including the one represented by 0xa0):

sub("^\\s+", "", x)
[1] "11.132592"

And for removing leading and trailing spaces:

gsub("(^\\s+)|(\\s+$)", "", x)
Cath
  • 23,906
  • 5
  • 52
  • 86
  • 4
    Thank you! For the record `stringr::str_trim` also works great – RockScience Jul 12 '17 at 10:36
  • @RockScience there is a `C` routine under `stringr::str_trim`, which I didn't check but I'm guessing it removes any knid of space, contrary to `trimws` ;-) – Cath Jul 12 '17 at 10:52
  • Future readers who come here and above `gsub` solution does not work due to carriage + line break space (`0xc2 0xa0`), see this [perl regex](https://stackoverflow.com/a/43734965/1422451) or this [remove all/any space](https://stackoverflow.com/a/27237551/1422451). – Parfait Apr 20 '19 at 16:16
3

A possible solution is replace the wrongly encoded spaces with the right ones:

trimws(rawToChar(replace(x1, x1 == as.raw(0xa0), as.raw(0x20))))

which gives:

[1] "11.132592"

For conversion to numeric, just wrap above code in as.numeric.


Used data:

x1 <- as.raw(c(0xa0, 0x31, 0x31, 0x2e, 0x31, 0x33, 0x32, 0x35, 0x39, 0x32))
Jaap
  • 81,064
  • 34
  • 182
  • 193