3

I have a .R file that contains several functions, one of which is defined as:

get_entry_detail <- function(con, vec_of_entryids){

  query <- paste0("select entryId, fieldName, fieldValue
                         from `hthu-eligibility`.entry_detail
                  where entryId in (", paste(vec_of_entryids, collapse = ","), ");")

  dbGetQuery(con, query) %>%
    mutate(fieldName = ifelse(fieldName == "firstName",
                              gsub(paste(c(""), collapse = "|"), "", fieldName),
                              fieldName))

}

Note that there is a mutate() that strips off a  when fieldName == "firstName".

I source() this file at the top of another .R file but when I view the function after sourcing the file, the function has changed to:

> source("R/get_join_employee_data_userid.R")
> get_entry_detail
function(con, vec_of_entryids){

  query <- paste0("select entryId, fieldName, fieldValue
                         from `hthu-eligibility`.entry_detail
                  where entryId in (", paste(vec_of_entryids, collapse = ","), ");")

  dbGetQuery(con, query) %>%
    mutate(fieldName = ifelse(fieldName == "firstName",
                              gsub(paste(c(""), collapse = "|"), "", fieldName),
                              fieldName))

}

And the  has now changed to . This causes later functions to fail because there is no  that needs to be removed thus later joins fail.

How do I prevent this from happening? I can't adjust the database structure.

Steven
  • 3,238
  • 21
  • 50

2 Answers2

1

The encoding of files is OS dependent. On my Linux machine your example runs without problems. Linux uses UTF-8 as default encoding. However, Windows uses the system's default encoding which may differ from UTF-8.

Hence, explicitly specifying encoding="UTF-8" in source() should fix the problem:

source("R/get_join_employee_data_userid.R", encoding="UTF-8")
Nairolf
  • 2,418
  • 20
  • 34
0

This answer provided a solution. Now my function look like:

get_entry_detail <- function(con, vec_of_entryids){

  dbSendQuery(con, 'set character set "utf8"')

  query <- paste0("select entryId, fieldName, fieldValue
                         from `hthu-eligibility`.entry_detail
                  where entryId in (", paste(vec_of_entryids, collapse = ","), ");")

  dbGetQuery(con, query)
}

although I still don't know why the characters were changed when I sourced the file rather than reading it in directly.

Steven
  • 3,238
  • 21
  • 50
  • It looks like your script is encoded in UTF-8 and `source` reads it as if it was encoded in windows-1252. If I'm right then you just need to use the `encoding` argument: `source("R/get_join_employee_data_userid.R", encoding = "UTF-8")`. – Scarabee Oct 30 '18 at 15:31