1

I'm trying to read this data : http://www.biostat.umn.edu/~brad/data/smoking.dat to R. I used the answer in http://stackoverflow.com/questions/11664075/import-dat-file-into-r

read.table("http://www.biostat.umn.edu/~brad/data/smoking.dat", 
           header=TRUE,, sep="\n", skip=2)

It works but gives wrong data.

head(x)
                  list.regions.81..num...c.8..5..3..8..5..1..6.
1                 7, 3, 5, 7, 7, 2, 2, 5, 6, 6, 7, 4, 8, 7, 6, 
2 6, 2, 8, 4, 4, 10, 4, 3, 7, 6, 5, 7, 7, 7, 5, 6, 4, 9, 4, 7, 
3  4, 5, 9, 3, 7, 5, 5, 4, 5, 6, 6, 5, 2, 6, 2, 8, 7, 6, 5, 6, 
4       3, 6, 6, 6, 6, 4, 10, 8, 3, 4, 2, 6, 5, 7, 7, 4, 7, 6, 
5                                                2),sumnum=441,
6                            adj=c(2, 5, 6, 8, 11, 45, 75, 80, 

Actually, in this data there are some list.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Math
  • 1,274
  • 3
  • 14
  • 32
  • 1
    `x <- dget("http://www.biostat.umn.edu/~brad/data/smoking.dat")` maybe? – David Arenburg Sep 02 '15 at 09:16
  • It seem to only import the second list- is that what you want? Either way, you can convert to a data frame by doing `res <- do.call(cbind.data.frame, x[-1L])` – David Arenburg Sep 02 '15 at 09:25
  • @DavidArenburg you're right it only gets the second list. Let's see if one can split the text before `dget`-ing. I can't think of a cleverer way; I have never tried to read in multiple objects like that. – C8H10N4O2 Sep 02 '15 at 12:32
  • @DavidArenburg updated, let me know if there's a simpler way – C8H10N4O2 Sep 02 '15 at 13:04

1 Answers1

0

You cannot read this file using read.table() because it is not a table. Rather, it is the text representation of R objects (in this case, two lists) such as produced by dput(). As David Arenburg suggests above, you should use dget(). I am a big fan of the httr package.

Edit: for an arbitrary number of list objects on a single page:

put_multiple_objs_from_url <- function(url){
  require(httr)  
  request <- GET(url)
  stop_for_status(request)
  text_lines <- readLines(textConnection(content(request, as = 'text')))

  # look for lines that start with "list(" to determine file parts
  start_lines <- grep('^list\\(',  text_lines)
  end_lines <- integer(length(start_lines))
  for (i in 1:(length(start_lines)-1) ){
    end_lines[i] <- start_lines[i+1] - 1
  }
  end_lines[length(start_lines)] <- length(text_lines)

  # dget each of these file parts as an element of obj_list 
  obj_list <- vector("list",length(start_lines))
  for( i in 1:length(start_lines) ){
    obj_txt <- paste0(text_lines[start_lines[i]:end_lines[i]],
                      collapse=" ")
    obj_list[[i]] <- dget(textConnection(obj_txt))
  }
  obj_list
}  

x <- put_multiple_objs_from_url("http://www.biostat.umn.edu/~brad/data/smoking.dat")

str(x)
# List of 2
# $ :List of 4
# ..$ regions: num 81
# ..$ num    : num [1:81] 8 5 3 8 5 1 6 7 3 5 ...
# ..$ sumnum : num 441
# ..$ adj    : num [1:441] 2 5 6 8 11 45 75 80 1 8 ...
# $ :List of 9
# ..$ N             : num 223
# ..$ Age           : num [1:223] 49 47 50 55 59 41 55 42 51 49 ...
# ..$ SexF          : num [1:223] 0 0 1 0 0 0 1 1 1 0 ...
# ..$ AgeStart      : num [1:223] 18 14 19 15 18 16 15 18 18 18 ...
# ..$ SIUC          : num [1:223] 1 0 1 1 1 1 1 1 1 1 ...
# ..$ F10Cigs       : num [1:223] 30 20 12 40 20 40 18 40 20 18 ...
# ..$ censored.time1: num [1:223] 1.01 5 4.99 5.04 5 ...
# ..$ censored.time2: num [1:223] 1.97 100 100 100 100 ...
# ..$ County        : num [1:223] 17 21 77 30 25 58 13 16 13 77 ...
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134