Why can't R read this CSV file?

Question

I'm trying to read this file of a list of countries into R. R can't seem to be able to read it since the imported dataset appears as empty.

This is my code:

universe =  read.csv("country-keyword-list.csv")

No error message appeared. Stata read the file just fine.

This is the link to the CSV file in question:

https://www.searchify.ca/list-of-countries/

Have a look at [this](https://stackoverflow.com/questions/23209464/get-embedded-nuls-found-in-input-when-reading-a-csv-using-read-csv) question. It suggests, `read.csv("country-keyword-list.csv", fileEncoding="UTF-16LE")`. Maybe this solves the problem. — maydin, Sep 12 '19 at 14:48
Also use `read.table` rather than `read.csv`. The file has no header and no commas. — G. Grothendieck, Sep 12 '19 at 14:53
`txt <- readLines(fl, skipNul = TRUE);txt <- txt[txt != ""]` works. — Rui Barradas, Sep 12 '19 at 15:50

score 4 · Answer 1 · answered Sep 12 '19 at 15:22

@maydin already gave the solution that works in the comments but I thought it would still be useful to show how you could discover it yourself. Note below that the UTF-16LE encoding has the highest confidence.

library(stringi)

u <- "https://www.searchify.ca/wp-content/uploads/2016/09/country-keyword-list.csv"
L <- readLines(u, skipNul = TRUE)
stri_enc_detect(L)[[1]]
##      Encoding Language Confidence
## 1    UTF-16LE                1.00
## 2  ISO-8859-2       cs       0.42
## 3  ISO-8859-1       en       0.21
## 4  ISO-8859-9       tr       0.21
## 5    UTF-16BE                0.10
## 6   Shift_JIS       ja       0.10
## 7     GB18030       zh       0.10
## 8      EUC-JP       ja       0.10
## 9      EUC-KR       ko       0.10
## 10       Big5       zh       0.10

countries <- read.table(u, fileEncoding = "UTF-16LE")

Gainz · Answer 2 · 2019-09-12T20:05:34.383

3

This is not a direct answer to your question since it has already been answered by maydin and G. Grothendieck very well, but if you ever struggle again with the file encoding format, I suggest you tried the guess_encoding() function from the readr package, it works pretty well.

install.packages("readr")
readr::guess_encoding("country-keyword-list.csv", n_max = 1000)

It will give an output like this :

  # A tibble: 3 x 2
  encoding   confidence
  <chr>           <dbl>
1 UTF-16LE         1.00   
2 ISO-8859-1       0.51
3 ISO-8859-2       0.38

It will most of the time works very well, so you can almost be sure of what encoding to choose.

edited Sep 12 '19 at 20:05

answered Sep 12 '19 at 15:49

Gainz

1,721
9
24

1

ohhh that's a life-savior! I was playing around with the encoding but all the encodings I tried did not work. This would have saved so much time.Thanks a lot! – Rainroad Sep 12 '19 at 17:57
Glad I could help, just note that most of the time it will work great but sometime you'll have to double check! Its definitively an excellent function tho. – Gainz Sep 12 '19 at 18:30
`reader::guess_encoding` calls `stringi::stri_enc_detect` which does the real work. – G. Grothendieck Sep 12 '19 at 20:28
Indeed thats in ``guess_encoding()`` documentation. But doesn't ``stri_enc_detect`` only work for .txt, raw or character vector? – Gainz Sep 13 '19 at 14:58

score 1 · Answer 3 · answered Sep 12 '19 at 14:58

1

universe <-  read.csv("country-keyword-list.csv", fileEncoding="UTF-16LE")

answered Sep 12 '19 at 14:58

Orlando Sabogal

1,470
7
20

That's the same as the first comment. – Gainz Sep 12 '19 at 15:37

score 0 · Answer 4 · answered Sep 12 '19 at 14:45

0

Try this:

universe =  read.csv("https://www.searchify.ca/wp-content/uploads/2016/09/country-keyword-list.csv")

answered Sep 12 '19 at 14:45

Jim G.

15,141
22
103
166

Why can't R read this CSV file?

4 Answers4

Linked