1

I am attempting to use R to accept as many user input files as required and to take those files and make one histogram per file of the values stored in the 14th column. I have gotten this far:

library("tcltk")
library("grid")
File.names<-(tk_choose.files(default="", caption="Choose your files", multi=TRUE, filters=NULL, index=1))
Num.Files<-NROW(File.names)
test<-sapply(1:Num.Files,function(x){readLines(File.names[x])})
data<-read.table(header=TRUE,text=test[1])
names(data)[14]<-'column14'
dat <- list(file1 = data.frame("column14"),
            file2 = data.frame("column14"),
            file3 = data.frame("column14"),
            file4 = data.frame("column14"))
#Where the error comes up
tmp <- lapply(dat, `[[`, 2)
lapply(tmp, function(x) {hist(x, probability=TRUE, main=paste("Histogram of Coverage")); invisible()})
layout(1)

My code hangs up though on the line that states tmp <- lapply(dat,[[, 2) The error that comes up is one of two things. If the line reads as above then the error is this:

Error in .subset2(x, i, exact = exact) : subscript out of bounds
Calls: lapply -> FUN -> [[.data.frame -> <Anonymous>

I did some research and found that it could be caused by a double [[]] so I changed it to tmp <- lapply(dat,[, 2) to see if it would do any good (as many tutorials said it might) but that just resulted in this error:

Error in `[.data.frame`(X[[1L]], ...) : undefined columns selected
Calls: lapply -> FUN -> [.data.frame

The input files all will follow this pattern:

Targ  cov  av_cov  87A_cvg  87Ag  87Agr  87Agr  87A_gra  87A%_1   87A%_3   87A%_5   87A%_10  87A%_20  87A%_30 87A%_40   87A%_50 87A%_75 87A%_100
1:028 400   0.42    400 0.42    1   1   2   41.8    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1:296 400   0.42    400 0.42    1   1   2   41.8    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Is this a common problem? Can anyone explain it to me? I am not too familiar with R but I hope to continue learning. Thanks EDIT: For reproducibility, if I run:

head(test)
head(data)
x <- list(mtcars, mtcars, mtcars);lapply(x, head)
head(dat)

This is the result:

   > head(test)
     [,1]                                                                                                                                               
[1,] "Targ  cov  av_cov  87A_cvg  87Ag  87Agr  87Agr  87A_gra  87A%_1   87A%_3   87A%_5   87A%_10  87A%_20  87A%_30 87A%_40\t87A%_50\t87A%_75\t87A%_100"
[2,] "1:028 400\t0.42\t400\t0.42\t1\t1\t2\t41.8\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0"                                                           
[3,] "1:296 400\t0.42\t400\t0.42\t1\t1\t2\t41.8\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0"                                                           
[4,] "1:453 1646\t8.11\t1646\t8.11\t7\t8\t13\t100.0\t100.0\t87.2\t32.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0"                                                   
[5,] "1:427 1646\t8.11\t1646\t8.11\t7\t8\t13\t100.0\t100.0\t87.2\t32.0\t0.0\t0.0\t0.0\t0.0\t0.0\t0.0"                                                   
[6,] "1:736 5105\t29.68\t5105\t29.68\t14\t29\t48\t100.0\t100.0\t100.0\t86.0\t65.7\t49.4\t35.5\t16.9\t0.0\t0.0"                                          
> head(data)
 [1] Targ      cov       av_cov    X87A_cvg  X87Ag     X87Agr    X87Agr.1 
 [8] X87A_gra  X87A._1   X87A._3   X87A._5   X87A._10  X87A._20  X87A._30 
[15] X87A._40  X87A._50  X87A._75  X87A._100
<0 rows> (or 0-length row.names)
> x <- list(mtcars, mtcars, mtcars);lapply(x, head)
[[1]]
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

[[2]]
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

[[3]]
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

> head(dat)
$file1
  X.column14.
1    column14

$file2
  X.column14.
1    column14

$file3
  X.column14.
1    column14

$file4
  X.column14.
1    column14

> tmp <- lapply(dat, `[`, 2)
Error in `[.data.frame`(X[[1L]], ...) : undefined columns selected
Calls: lapply -> FUN -> [.data.frame
Execution halted
Stephopolis
  • 1,765
  • 9
  • 36
  • 65
  • It's probably a very simple fix, however, without reproducible data we can't really help easily. You can take care of this by using `lapply(dat, head)`. Using `mtcars` it would work like this: `x <- list(mtcars, mtcars, mtcars);lapply(x, head)`. That will allow us to know what `dat` looks like and how to best provide assistance. For more direction on making a reproducible example please see: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Tyler Rinker Aug 13 '12 at 19:20
  • 1
    Try to inspect the value of `dat` immediately after you define it. You'll notice it's not what you expect it to be. Use `str(dat)` or `head(dat)` to do this. – Andrie Aug 13 '12 at 19:21
  • @TylerRinker I added information to help make it reproducible – Stephopolis Aug 13 '12 at 19:38

1 Answers1

1

What are you trying to do here?

tmp <- lapply(dat, `[[`, 2)

The lapply function is equivalent to

list(file1=dat[[1]][[2]],
     file2=dat[[2]][[2]],
     file3=dat[[3]][[2]],
     file4=dat[[4]][[2]])

This doesn't work. You're trying to extract column 2 out of data frame that only has 1 column.

Redefine dat as this, and it will work.

dat <- list(file1 = data.frame("column14","iforgotcolumn2"),
            file2 = data.frame("column14","iforgotcolumn2"),
            file3 = data.frame("column14","iforgotcolumn2"),
            file4 = data.frame("column14","iforgotcolumn2"))
ppham27
  • 813
  • 6
  • 10