1

Little bit of backstory:

I am reading a SAV file into R using read_sav() from haven. I am taking the labels found in the SAV file (accessed by attr(sav_file, "label")). I would like to use these section labels as headers in a Latex document.

Here's the issue: Latex does not accept certain characters. Rendering rMarkdown produces the error "Package inputenc Error: Unicode char € (U+80) (inputenc) not set up for use with LaTeX."

Here's a small string sample that's causing the problem and examples of some of things I have tried:

unencoded_string <- "following statement? “Tourism is good"

Others have fixed this problem using methods like:

Encoding(unencoded_string) <- "UTF-8"

and

iconv(unencoded_string, to = "UTF-8")

These function calls result in removing bits of the unwanted characters, but I am still left with characters I do not want:

"following statement? “Tourism is good"

Other regular expression methods do not work.

Does anyone have something that might help, or point me in the right direction? I've run into this kind of problem before, but have always found a work-around.

detroyejr
  • 1,094
  • 1
  • 9
  • 14
  • 2
    You can take a look [here](http://stackoverflow.com/questions/2124010/grep-regex-to-match-non-ascii-characters) on how to remove non-ASCII characters. – agstudy Mar 15 '17 at 21:06

1 Answers1

3

It seems to work. Try this

txt = "following statement? “Tourism is good"
gsub("[^\\x00-\\x7F]+", "",txt, perl = TRUE)

> gsub("[^\\x00-\\x7F]+", "",txt, perl = TRUE)
[1] "following statement? Tourism is good"
Kristofersen
  • 2,736
  • 1
  • 15
  • 31
  • This worked beautifully! Thank you. But I don't really understand the Perl syntax. If you have happen to have any recommended reading on that subject, that would be great. – detroyejr Mar 16 '17 at 14:50
  • 1
    @jonathande4 check out hackerrank's regex course. It goes into a lot of detail and is very easy to follow. – Kristofersen Mar 16 '17 at 14:52