Here are several different approaches to try. Please comment as to whether they assist.
1. Query the database as to what its encoding is
This question starts with the following SQL code snippet for getting the encoding in a PostgreSQL database (it has several answers which might also assist):
dbGetQuery(con, "SHOW CLIENT_ENCODING")
# client_encoding
# 1 UTF8
If you are not using a PostgreSQL database there will likely be an equivalent command for the database you are using.
Reading the documentation of ?odbc::dbConnect
is appears encoding
is the text encoding for your database (if it is not UTF8). And that strings are always returned UTF-8 encoded.
Given your note that the character set of the database is AR8ISO8859P6, I guess the client encoding will return something like "ar8-iso8859"
and that this is the term to put into the connection. E.g.:
db_conn <- dbConnect(odbc::odbc(), "my_db", encoding = "ar8-iso8859")
2. Test every available encoding
You mentioned in a comment there are 232 possible encodings. This link shows a function for testing how outputs differ with two different encodings.
If you can not get the encoding from the database, then iterating through and testing all encoding might be the best option. Perhaps something like this:
testEncoding <- function(encoding){
# My connection string to database
db_conn <- dbConnect(odbc::odbc(), "my_db", encoding = encoding)
# Extracting data
result <- tbl(db_conn, "my_table") %>%
filter(ID == 100010456) %>%
select(ADDRESS_1) %>%
collect()
# disconnect & return
dbDisconnect(db_conn)
return(result[1,2])
}
list_of_encodings <- stringi::stri_enc_list()
for(encoding in list_of_encodings){
print(paste(encoding[1], " | ", testEncoding(encoding[1])))
}
3. Run a test writing out an intermediate text file
The database should allow you to export data as a csv or equivalent file. This file can then be read into R using standard approaches:
- Export subset of data containing Arabic characters to plain text file (most likely UTF-8).
- Examine contents of file to confirm Arabic characters appear.
- Import plain text file into R. Examine imported data frame.
- Export imported data frame from R back to plain text (UTF-8).
- Examine contents of second file to confirm Arabic characters appear.
This approach should give you some additional insight into where the communication between R an the database is failing. Do you require specific setting to export Arabic characters from the database? To import them into R? Does R read and write the Arabic characters correctly, but fail to display them?