1

I'm trying to read this file of a list of countries into R. R can't seem to be able to read it since the imported dataset appears as empty.

This is my code:

universe =  read.csv("country-keyword-list.csv")

No error message appeared. Stata read the file just fine.

This is the link to the CSV file in question:

https://www.searchify.ca/list-of-countries/

M--
  • 25,431
  • 8
  • 61
  • 93
Rainroad
  • 191
  • 8
  • 9
    Have a look at [this](https://stackoverflow.com/questions/23209464/get-embedded-nuls-found-in-input-when-reading-a-csv-using-read-csv) question. It suggests, `read.csv("country-keyword-list.csv", fileEncoding="UTF-16LE")`. Maybe this solves the problem. – maydin Sep 12 '19 at 14:48
  • 2
    Also use `read.table` rather than `read.csv`. The file has no header and no commas. – G. Grothendieck Sep 12 '19 at 14:53
  • `txt <- readLines(fl, skipNul = TRUE);txt <- txt[txt != ""]` works. – Rui Barradas Sep 12 '19 at 15:50
  • Forgot to mention that `fl <- "country-keyword-list.csv"`. – Rui Barradas Sep 12 '19 at 19:52

4 Answers4

4

@maydin already gave the solution that works in the comments but I thought it would still be useful to show how you could discover it yourself. Note below that the UTF-16LE encoding has the highest confidence.

library(stringi)

u <- "https://www.searchify.ca/wp-content/uploads/2016/09/country-keyword-list.csv"
L <- readLines(u, skipNul = TRUE)
stri_enc_detect(L)[[1]]
##      Encoding Language Confidence
## 1    UTF-16LE                1.00
## 2  ISO-8859-2       cs       0.42
## 3  ISO-8859-1       en       0.21
## 4  ISO-8859-9       tr       0.21
## 5    UTF-16BE                0.10
## 6   Shift_JIS       ja       0.10
## 7     GB18030       zh       0.10
## 8      EUC-JP       ja       0.10
## 9      EUC-KR       ko       0.10
## 10       Big5       zh       0.10

countries <- read.table(u, fileEncoding = "UTF-16LE")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
3

This is not a direct answer to your question since it has already been answered by maydin and G. Grothendieck very well, but if you ever struggle again with the file encoding format, I suggest you tried the guess_encoding() function from the readr package, it works pretty well.

install.packages("readr")
readr::guess_encoding("country-keyword-list.csv", n_max = 1000)

It will give an output like this :

  # A tibble: 3 x 2
  encoding   confidence
  <chr>           <dbl>
1 UTF-16LE         1.00   
2 ISO-8859-1       0.51
3 ISO-8859-2       0.38

It will most of the time works very well, so you can almost be sure of what encoding to choose.

Gainz
  • 1,721
  • 9
  • 24
  • 1
    ohhh that's a life-savior! I was playing around with the encoding but all the encodings I tried did not work. This would have saved so much time.Thanks a lot! – Rainroad Sep 12 '19 at 17:57
  • Glad I could help, just note that most of the time it will work great but sometime you'll have to double check! Its definitively an excellent function tho. – Gainz Sep 12 '19 at 18:30
  • `reader::guess_encoding` calls `stringi::stri_enc_detect` which does the real work. – G. Grothendieck Sep 12 '19 at 20:28
  • Indeed thats in ``guess_encoding()`` documentation. But doesn't ``stri_enc_detect`` only work for .txt, raw or character vector? – Gainz Sep 13 '19 at 14:58
1
universe <-  read.csv("country-keyword-list.csv", fileEncoding="UTF-16LE")
Orlando Sabogal
  • 1,470
  • 7
  • 20
0

Try this:

universe =  read.csv("https://www.searchify.ca/wp-content/uploads/2016/09/country-keyword-list.csv")
Jim G.
  • 15,141
  • 22
  • 103
  • 166