10

I am trying to read the data from this link in R using the following code but I keep getting warning messages and the dataframe doesn't read the data properly.

url <- 'https://onlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission.txt'
df <- read.table(url, sep = '\t',header = F, skip = 2,quote='', comment='')

Can you tell what I need to change to read the data

EDIT

Adding data snippet

REMISS  CELL    SMEAR   INFIL   LI  BLAST   TEMP
1   0.8 0.83    0.66    1.9 1.1 1
1   0.9 0.36    0.32    1.4 0.74    0.99
0   0.8 0.88    0.7 0.8 0.18    0.98
0   1   0.87    0.87    0.7 1.05    0.99
1   0.9 0.75    0.68    1.3 0.52    0.98
0   1   0.65    0.65    0.6 0.52    0.98
1   0.95    0.97    0.92    1   1.23    0.99
0   0.95    0.87    0.83    1.9 1.35    1.02
Clock Slave
  • 7,627
  • 15
  • 68
  • 109
  • 1
    The file appears to have many invalid chars. If you copy its contents, you can probably safely read it in as `paste(readClipboard(), collapse="\n")`, though. For example, with the data.table package `data.table::fread(paste(readClipboard(), collapse="\n"))`. Btw, this is not a good question for SO, relying on external links for example data. – Frank Mar 15 '17 at 19:44
  • I can't say there are invalid chars. I downloaded the file and tried reading it. I still got the same errors. I then replaced the tabs with commas and read it again by setting `sep = ","` and it worked. Also, I added the url to make it easier for people to directly copy paste and run the code. But yeah, point noted. Thanks – Clock Slave Mar 15 '17 at 19:51
  • 1
    Re invalid chars, I think ycw's answer works and is the right way; I was suggesting manually selecting the contents, right-click -> copy to get it to the clipboard. This crude workaround worked for me. Btw, the reason for not liking external links is (as you probably know) that we want the Q&A to still be valuable years from now (when most links like that will be broken). – Frank Mar 15 '17 at 19:54
  • true that. made ammends – Clock Slave Mar 15 '17 at 20:28

2 Answers2

11

It is an issue about encoding. Please see this thread for more information (Get "embedded nul(s) found in input" when reading a csv using read.csv()).

url <- 'https://onlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission.txt'
df <- read.table(url, sep = '\t',header = TRUE, fileEncoding = "UTF-16LE")
Community
  • 1
  • 1
www
  • 38,575
  • 12
  • 48
  • 84
2

Also consider,

url <- 'https://onlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission.txt'
df <- read.csv(url, sep="\t", header=T)
bmc
  • 817
  • 1
  • 12
  • 23