0

So I need to create a csv file which needs to be read by a piece of third party software (pxjob). For some reason, it refuses to read my scandinavian characters if I don't use UTF-8-BOM. Since I must also use semicolons as delimiters I tend to use the write_excel_csv-function, since that takes care of both of these problems.

The csv files look like this when they're exported:

"Deltagande kommuner vårdtagare";;;
"Län";"Enhet";"År";"value"
"10 Blekinge län";"Totalt antal kommuner i län";"2015";"5"
"10 Blekinge län";"Antal deltagande kommuner";"2015";"4"
"20 Dalarnas län";"Totalt antal kommuner i län";"2015";"15"
"20 Dalarnas län";"Antal deltagande kommuner";"2015";"4"
"09 Gotlands län";"Totalt antal kommuner i län";"2015";"1"

However, this format causes pxjob to read the file incorrectly:

enter image description here

I have however found a strange workaround. If I open Excel, do a random change, reverse the change and save/overwrite the csv file (as a csv file, still), the file will look like this instead:

Deltagande kommuner vårdtagare;;;
Län;Enhet;År;value
10 Blekinge län;Totalt antal kommuner i län;2015;5
10 Blekinge län;Antal deltagande kommuner;2015;4
20 Dalarnas län;Totalt antal kommuner i län;2015;15
20 Dalarnas län;Antal deltagande kommuner;2015;4
09 Gotlands län;Totalt antal kommuner i län;2015;1

So the double quotes have disapperad. And now the files are read correctly into pxjob:

enter image description here

There's probably a more precise terminology for all of this, but is it possible to create the csv file in this format directly from R, and preferrably using the write_excel_csv function (or something equivalent)?

Repex:

structure(list(Län = c("10 Blekinge län", "10 Blekinge län", 
"20 Dalarnas län", "20 Dalarnas län", "09 Gotlands län", "09 Gotlands län"
), Enhet = c("Totalt antal kommuner i län", "Antal deltagande kommuner", 
"Totalt antal kommuner i län", "Antal deltagande kommuner", 
"Totalt antal kommuner i län", "Antal deltagande kommuner"), 
    År = c(2015, 2015, 2015, 2015, 2015, 2015), value = c(5, 
    4, 15, 4, 1, 0)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

UPDATE:

If I use the following export statement:

write_excel_csv2(as.data.frame(df2),
                     paste0("G:\\Data\\Folkhälsodata\\_3b px-filer (arbetsmaterial2023)\\Filer för Pxjob\\In_headers\\",fillista[i]),col_names=FALSE, na="", quote = c("none"))

I end up with a file looking like this:

Deltagande kommuner vårdtagare;;;
Län;Enhet;År;value
10 Blekinge län;Totalt antal kommuner i län;2015;5
10 Blekinge län;Antal deltagande kommuner;2015;4
20 Dalarnas län;Totalt antal kommuner i län;2015;15
20 Dalarnas län;Antal deltagande kommuner;2015;4

But with a file looking like this after I run it through pxjob:

enter image description here

(so the first character in the variable name of the first observation has been ignored, thus creating a new variable)

user2165379
  • 445
  • 4
  • 20
Magnus
  • 728
  • 4
  • 17

1 Answers1

1

Im not familiar with the readr lib but if you want to use the write_excel_csv function, you could specify the options in the options argument. Here's an example:

library(readr)

write_excel_csv(df, "file.csv", quote = none, sep = ";", fileEncoding = "UTF-8-BOM")

This should produce a CSV file with Scandinavian characters properly encoded and without double quotes around the values.

Eirik Fesker
  • 328
  • 1
  • 12
  • Just a hint on `UTF-8` and `UTF-8-BOM`: https://stackoverflow.com/questions/2223882/whats-the-difference-between-utf-8-and-utf-8-with-bom – Martin Gal Apr 11 '23 at 10:43
  • If I use the second solution it kinda works. It doesn't allow for a list of options to be specified, and I have to set quote to NULL instead of FALSE (I write it as its own argument). It doesn't let med specify row names as false either. But if I set quote=NULL then at least the extra columns disapperar, the first row still looks weird though :/ – Magnus Apr 11 '23 at 10:48
  • I tried using the write.csv function, but it just throws an error, claiming: unsupported conversion from 'UTF-8-BOM' to '' – Magnus Apr 11 '23 at 10:49
  • The manual says "UTF-8-BOM" is only for reading, not writing. "The encoding "UTF-8-BOM" is accepted for reading and will remove a Byte Order Mark if present (which it often is for files and webpages generated by Microsoft applications). If a BOM is required (it is not recommended) when writing it should be written explicitly, e.g. by `writeChar("\ufeff", con, eos = NULL)` or `writeBin(as.raw(c(0xef, 0xbb, 0xbf)), binary_con)`" – user2554330 Apr 11 '23 at 11:02
  • Edit: I apologize for the confusion. It seems that the write.csv function does not support writing CSV files with a byte order mark (BOM) in R. However, im glad you can still use the write_excel_csv function to write a CSV file with the desired format and encoding. I removed the first solution from my answer. i see u uploaded a data sample. I'll try this later when ive got more time. Can you try the argument "quote = none"? – Eirik Fesker Apr 11 '23 at 11:41
  • @EirikFesker I tried that as my first solution, with the same result. I went over to the "needed"-option instead, hoping I had just removed one quote too many. Irritatingly enough it still works if I open the file in excel and overwrite it. – Magnus Apr 11 '23 at 12:32
  • @EirikFesker When I had saved/overwritten the file in excel (yielding a file that works) I opened it with Notepad++ and found the encoding to be ANSI and not UTF-BOM. The plot thickens... – Magnus Apr 11 '23 at 12:44
  • @EirikFesker I think I might have solved this using the windows-1252-format instead, so my assumptions were incorrect (it turns out excel converted the UTF-8-BOM-encoding to windows-1252). In the end I had to resort to the write.table-function since the write.csv-functions did not allow me to alter the encoding nor to omit the column names. – Magnus Apr 11 '23 at 13:48
  • write.csv is base R and write_excel_csv is from readr package. "fileEncoding" is not a parameter from write_excel_csv. See also https://readr.tidyverse.org/reference/write_delim.html in paragraph Output for an explanation of the different column separators. – user2165379 Apr 15 '23 at 10:14