0

I have two obs one has None-breaking space and another has regular space. I want to make them the same. how should I do the replace one with another? Is it a way to take care such issue? I spend an hour to find those two are in fact not the SAME. :(

the one in old is: [1] 45 72 79 2e 20 4d 65 61 6e 20 43 6f 72 70 75 73 63 75 6c 61 72 20 56 6f 6c 75 6d 65

the one in new is: [1] 45 72 79 2e c2 a0 4d 65 61 6e c2 a0 43 6f 72 70 75 73 63 75 6c 61 72 c2 a0 56 6f 6c 75 6d 65

df<-structure(list(new = c("Ery. Mean Corpuscular Volume", "Ery. Mean Corpuscular Volume"
), old = c("Ery. Mean Corpuscular Volume", "Ery. Mean Corpuscular Volume"
)), row.names = c(NA, -2L), class = "data.frame")

Is it a way to standardize all those spaces in one setting? Too hard to find that they are in fact different.

enter image description here

Stataq
  • 2,237
  • 6
  • 14
  • Does this answer your question? [remove (non-breaking) space character in string](https://stackoverflow.com/questions/43734293/remove-non-breaking-space-character-in-string) – andschar Feb 23 '22 at 14:04
  • Sorry, that doesn't work. I tried gsub, string.replace, bytes.replace. – Stataq Feb 23 '22 at 14:06
  • 1
    I found [this post from Tony from 2017](https://blog.tonytsai.name/blog/2017-12-04-detecting-non-breaking-space-in-r/) to solve my problem with whitespace. I combed through the data column-wise until the data was clean. Ideally this would be solved on the database level, if that's where you get your data. – Roman Luštrik Feb 23 '22 at 14:10
  • Thanks. I saw tihs one too. But can't get it to work. I tried`gsub("\u00A0", " ", x, fixed = TRUE)`, I also tried `str_replace("\xc2\xa0", " ",` – Stataq Feb 23 '22 at 14:27
  • Sorry, not working. – Stataq Feb 23 '22 at 14:34
  • would [this post](https://stackoverflow.com/a/62290560/4137985) help (/does the solution work for you - changing the replace part by a space)? – Cath Feb 23 '22 at 14:58
  • Thanks @Cath. However it doesn't work. my problem is the `new` used `c2 a0`, the `old` used `20` for the space. – Stataq Feb 23 '22 at 15:04
  • I tried using dput to get the same data as example `df`. Maybe it did not work. :( – Stataq Feb 23 '22 at 15:12
  • 1
    i convert it to raw byte. it seems dput can not reproduce the same data. It will be too hard to get the same problem. – Stataq Feb 23 '22 at 15:27
  • could you try the gsub line but the other way around (i.e: `gsub("\\s", " ", df$new)` to be compared to `df$old`) ? you're replacing a regular space with a regular space and then try to compare it to the string with non-breakable space (and, just for the record, `\u00a0` is converted to `c2 a0` in raw, which is converted back to `Â ` with `rawToChar`). – Cath Feb 23 '22 at 15:46
  • Sorry, not working. I am giving up. :( – Stataq Feb 23 '22 at 16:18
  • I reproduced your raw vector and tried several `gsub`, which worked so either you don't have what you think you have or you have another typo in your code. Maybe try in a fresh session and just replace the right string `gsub` (or both strings actually, juste to be sure) without doing any raw/char conversion in the middle. – Cath Feb 23 '22 at 16:31

1 Answers1

0

Should work with gsub

x <- "I\u00A0am\u000Aa\u000Btotal\u0020mess"
y <- "I am a total mess"

identical(x, y)
# [1] FALSE

identical(gsub("\\s", " ", x), y)
# [1] TRUE
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22