0

Problem

I have been working on merging and standardizing several survey datasets. One problem that I'm running across is that there is inconsistent use of punctuation. Sometimes, the research is coded with a standard ', and other times is coded with .

For example, the names of the Ivory Coast in French is Côte d'Ivoire. Unfortunately, the data are not uniformly coded across time. As a result, when I run a crosstab, I get this:

country         2008      2009
-------         ----      ----
Cote d'Ivoire    498        0
Cote d’Ivoire     0        502

What I want to get is this:

country         2008      2009
-------         ----      ----
Cote d'Ivoire    498       502

When I try to standardize these to use the ' rather than the , I have absolutely no luck. It just doesn't seem to do anything. Here is the code I would use:

data$country[data$country == "Cote d’Ivoire"] <- Cote d'Ivoire

For some reason, I can't seem to figure this out, and it's driving me nuts. Does anyone know what I'm doing wrong?

Thank you!

Yasha
  • 330
  • 2
  • 16
  • 1
    firt what does `sum(data$country == "Cote d’Ivoire")` return? – Onyambu Jan 31 '18 at 04:21
  • Well, I think I figure it out! I used `trimws()` to see whether there was perhaps some extra blank space in there, and it seems to have fixed the issue :) – Yasha Feb 25 '18 at 15:08

1 Answers1

2

you can replace a value with another value using gsub

data$country=gsub("’","'",data$country)

In case it doesnt work you may need to escape the special character using a double backslash

data$country=gsub("\\’","'",data$country)

See

Remove pattern from string with gsub

Ajay Ohri
  • 3,382
  • 3
  • 30
  • 60