Import online excel file with two headers

Question

I am trying to download directly one supplementary material to R and I am having trouble. First, I tried:

datatable = fread("https://doi.org/10.1371/journal.pone.0242866.s001")

The first problem was:

Error in fread("https://doi.org/10.1371/journal.pone.0242866.s001") : 
  embedded nul in string: '\xf7\xc5B1\xfcL\xf8xu~\xad\016\001\xddL5(\xb2I`\x8bҶ\005tL\xeePd\x97"{\024\xd9'p@C\031R\x9b#\x8a\034S\x9b\023\002\xa7\004Ψ\xcd9E.(rI\x91+\001\xddM~Mi7\x94vK\xe0\x8e\xd2&\t\x85&)\xc6\xd6\021U\xd6\a\023\x96X\177\033F\xde'J|\x9fY\x99\037L\030ǫ\xdc\017\xefe\xe1x\x91\xfd\x89\xd2?l\x84q6T\0\xc1\x84qD*\001\037\xab\032\xf0(\x8a Q\025\xf8X\x95A0a\xec\xb3'
In addition: Warning message:
In fread("https://doi.org/10.1371/journal.pone.0242866.s001") :
  Detected 1 column names but the data has 3 columns (i.e. invalid file). Added 2 extra default column names at the end.

I tried

datatable = data.table(read.csv("https://doi.org/10.1371/journal.pone.0242866.s001"))

However, the result brought me only one variable.

So I tried

datatable = data.table(read.csv2("https://doi.org/10.1371/journal.pone.0242866.s001"))

Again, the problem persisted but the number of observations

When I tried to use read_excel and add a skip to see if I can exclude the first line, it appeared an error in the path.

datatable = data.table(read_excel("https://doi.org/10.1371/journal.pone.0242866.s001"), skip = 1)

Error: `path` does not exist: ‘https://doi.org/10.1371/journal.pone.0242866.s001’

Can some help me?

Matt · Accepted Answer · 2022-10-11T20:56:16.423

1

You can use openxlsx and tail():

library(openxlsx)

datatable <- openxlsx::read.xlsx(
  'https://doi.org/10.1371/journal.pone.0242866.s001', sheet = 1
) %>%
tail(-1)

edited Oct 11 '22 at 20:56

answered Oct 11 '22 at 20:49

Matt

7,255
2
12
34

Thank you very much, man. You helped me a lot! One question, why should we use sheet = 1? The file has only one sheet. Other thing. The file has two headers. I could delete one of them with other command (I think). But I cannot use skip = 1? I used in other file (not online), and it worked. I tried to use it by adding the "sheet = 1, skip = 1" but it did not work. – Valério Souza Neto Oct 11 '22 at 20:59
no problem! you might not need `sheet = 1`, i just threw it in there to be specific. I don't see an option in the `openxlsx::read.xlsx()` function for skipping rows, so that's why we use the `tail()` function instead. I'm sure there are many other ways to do it – Matt Oct 11 '22 at 21:06

Tim Jones · Answer 2 · 2022-10-11T21:26:47.600

0

Try this:

library(openxlsx)
openxlsx::datatable = read.xlsx("https://doi.org/10.1371/journal.pone.0242866.s001", sheet = 1)

I used the answer from Abdinardo Oliveira on this question for reference.

edited Oct 11 '22 at 21:26

answered Oct 11 '22 at 21:25

Tim Jones

1
1

Import online excel file with two headers

2 Answers2