So, I've been struggling with this for a while now and can't seem to google my way out of it. I'm trying to read a .sql file into R, I always do that to avoid putting 100+ lines of sql in my R scripts. I usually do this:
library(tidyverse)
library(DBI)
con <- dbConnect(<CONNECTION ARGUMENTS>)
query <- read_file("path/to/script.sql")
df <- as_tibble(dbGetQuery(con, query))
dbDisconnect(con)
However, this time my sql script has some Spanish characters in it. Say something like this:
select tree_id, tree
from forest.trees
where species = 'árbol'
When I read this script into R and make the query it just doesn't return anything, but if I copy and paste the sql script into an R string it works! So it seems that the problem is in the line where I read the script into R.
I tried changing the string's encoding in a couple of ways:
# none of these work
query <- read_file("path/to/script.sql")
Encoding(query) <- "latin1"
query <- readLines("path/to/script.sql", encoding = "latin1")
query <- paste0(query, collapse = " ")
Unfortunately I don't have a public database to offer to anyone reading this. I'm connecting to a postgreSQL 11 database.
--- UPDATE ----
I'm on a windows 10 machine, with US locale.
When I use the read_file
function the contents of query
look ok, the Spanish characters print out like they should, but when I pass it to dbGetQuery
it just doesn't fetch anything.
I tried forcing encoding "latin1" because I found online that Spanish characters tend to fix in R when doing that. When doing this, the Spanish characters print out wrong, so I didn't expected it to work, and it didn't.
The character values in my database have 'utf-8' encoding.
Just to be completely clear, all my attempts to read the .sql script haven't worked, however this does work:
library(tidyverse)
library(DBI)
con <- dbConnect(<CONNECTION ARGUMENTS>)
query <- "select tree_id, tree from forest.trees where species = 'árbol'"
# df actually has results
df <- as_tibble(dbGetQuery(con, query))
dbDisconnect(con)