0

I'm trying to transform a Markdown file to a .docx file with pandoc. Unfortunately it is bitterly and stubbornly complaining about its format not being "UTF-8":

enter image description here

When creating the Markdown file, I'm using text-data from an Excel file written in English. Two of the columns are coded in an "unknown" format according to "Encoding" as per How to identify/delete non-UTF-8 characters in R. Please see example vector for one of the columns (with data categories) below:

exampleVector
 [1] "other wards"  "organisation" "other wards"  "Trystview"    "break"        "other wards" 
 [7] "Trystview"    "other"        "break"        "other"  

exampleVector %>% Encoding()
 [1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"

exampleVector %>% dput()
c("other wards", "organisation", "other wards", "Trystview", 
"break", "other wards", "Trystview", "other", "break", "other"
)

I've tried all suggestions on How to identify/delete non-UTF-8 characters in R and Force character vector encoding from "unknown" to "UTF-8" in R without success, including the commands in the "stringi" library to transform the above vector to "UTF-8" format. I'm not sure what I'm missing and am wondering why the format of a fairly mundane Excel file is posing such challenges for pandoc. I used read_excel from the "readxl" library to import Excel data. Would be grateful for any suggestions.

Yozef
  • 113
  • 8
  • did you try changing the default encoding in Tools >> Global Options to UTF-8 and then saving the Markdown file? – Mohanasundaram Apr 18 '20 at 10:13
  • Your error points you towards two line with invalid characters, lines 45 and 313 - can you try to replace those characters? It might be due to a copy-paste. – mhovd Apr 18 '20 at 10:54
  • @Mohanasundaram: Thanks for your comment. I'm using RStudio version 1.2.5042. I couldn't find any references to UTF-8 under Tools --> Global Options. Do you mean somewhere under Tools --> Global Options --> Code? I can't find it there, either ... Can you be more specific? – Yozef Apr 18 '20 at 11:06
  • @Yozef Tools --> Global Options --> Code --> under saving tab – Mohanasundaram Apr 18 '20 at 11:10
  • @mhh: Thanks for your comment. The Markdown file looks in Notepad pristine and clear. I can't see any dodgy characters there. I deleted the first couple of hundred characters for a laugh and tried pandoc then. Now the error message says the (new) file has invalid UTF-8 characters at position 255. I suspect there are a number of those offending characters in the file, however they don't seem to be obviously or visually identifiable with Notepad. – Yozef Apr 18 '20 at 11:13
  • @Mohanasundaram: I've found the option above now and changed it to UTF-8. I still have the same error message as before. No changes when switching off RStudio and re-starting it ... – Yozef Apr 18 '20 at 11:19
  • @Yozef did yuo open the Rmd file and saved it once again? Meanwhile, I'll look for the other possible solution. – Mohanasundaram Apr 18 '20 at 11:21
  • Has anyone tried what `Encoding` returns with the exampleVector above. Do you get "unknown", too? – Yozef Apr 18 '20 at 11:22
  • @Mohanasundaram: I'm not sure how to open and save the .md file once again. I'm not very good with Markdown. I had my code save a Markdown file with a different name and tried to use `pandoc` on that. Same result as above ... – Yozef Apr 18 '20 at 11:28
  • When you say Notepad, are you implying that you are editing this file in Notepad, or RStudio? – mhovd Apr 18 '20 at 12:02
  • @mhh: The Markdown file is created by an R code. I just visualised its contents with NotePad, didn't create the Markdown file with NotePad. – Yozef Apr 18 '20 at 13:51

1 Answers1

1

I found the answer to my frustrations! I only had to add the parameter encoding = "UTF-8" to the lines defining the creation of the Markdown file in the R code:

fileConn <- file("C:/projects/use of time/report1.md", encoding = "UTF-8")
close(fileConn)
Yozef
  • 113
  • 8