2

I have a vector of raw text that includes English and French words, so there are French words that have accented characters like this:

1 entretien ménager               
2 concepteur réseaux              
3 service à la clientèle          
4 sécurité                        
5 infirmière auxiliaire           
6 opérateur de machinerie en usine
7 consultant stratégique          
8 ménage                          
9 ingénieur civil, gérant projet  
10 éducatrice

The command Encoding(variable) tells me that there's a mix of 'unknown' and UTF-8 encodings. All of the ones above are coded as UTF-8.

This code replicates the problem on my mac:

library(foreign)
vec<-c('sécurité', 'service à la clientèle', 'assembleur', 'labour')
write.csv(data.frame(vec), file='~/Desktop/test.csv') 

I have tried the same with write_excel_csv() and I get the same results.

I can only assume this is some kind of problem with the utf-8 encoding, but I can' t see my way to figure this out.

Thank you.

Results of sessionInfo()

R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] readr_1.1.1      bindrcpp_0.2.2   labelled_1.0.0   haven_1.1.1.9000
 [5] survey_3.32-1    survival_2.41-3  Matrix_1.2-10    car_2.1-5       
 [9] stargazer_5.2    foreign_0.8-69   tidyr_0.8.0      dplyr_0.7.4     
[13] ggplot2_2.2.1   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16       pillar_1.2.1       compiler_3.4.1     nloptr_1.0.4      
 [5] plyr_1.8.4         bindr_0.1.1        forcats_0.3.0      tools_3.4.1       
 [9] lme4_1.1-13        tibble_1.4.2       gtable_0.2.0       nlme_3.1-131      
[13] lattice_0.20-35    mgcv_1.8-17        pkgconfig_2.0.1    rlang_0.2.0       
[17] cli_1.0.0          rstudioapi_0.7     yaml_2.1.18        parallel_3.4.1    
[21] SparseM_1.77       hms_0.4.1          MatrixModels_0.4-1 nnet_7.3-12       
[25] glue_1.2.0         R6_2.2.2           minqa_1.2.4        purrr_0.2.4       
[29] magrittr_1.5       scales_0.5.0       MASS_7.3-47        splines_3.4.1     
[33] assertthat_0.2.0   pbkrtest_0.4-7     colorspace_1.3-2   quantreg_5.33     
[37] utf8_1.1.3         lazyeval_0.2.0     munsell_0.4.3      crayon_1.3.4  `

I should add, I have looked at some of the issues on GitHub and SO such as this, this, this, but have not found my answer.

spindoctor
  • 1,719
  • 1
  • 18
  • 42
  • 1
    Is the problem you're having that Excel mangles the special characters when you go to try and read it? Have you tried other text editors like NotePad++ or Google Sheets? If that's indeed your problem try looking at [this post](https://stackoverflow.com/a/6488070/9374673). – Mihai Chelaru Apr 26 '18 at 16:10
  • 1
    Thanks. Importing to Google Sheets, exporting as Excel spreadsheet worked, then importing to Excel worked super. – spindoctor Apr 26 '18 at 16:24
  • Could it be because your file was missing the UTF-8 BOM ? – jeanpic Jul 01 '21 at 11:02
  • What is the UTF-8 BOM? – spindoctor Jul 02 '21 at 13:25

0 Answers0