3

The following simple loop seems to skip elements in a data frame. I would appreciate any tips in figuring out where the problem with the data/code may lay.

foo <- apply(data, 1, function(x) {

    vec <- x
    mylist <- list()

    for (i in vec){
        #print(i)
        mylist[[i]]<-i
    }
    print(length(vec))
    print(length(mylist))
})

My data frame has 25 columns. For some rows, length(vec) returns 25, while length(mylist) returns 24.

[1] 25
[1] 24

If I use the hashed-out print(i) I can see 25 elements in all rows.

The above is a simplification of the actual code I want to use, but the problem already occurs in this simple format.

Thanks in advance!

PS. I have tried having data as character or as factor. Neither seems to impact the problem.

PPS. two lines of the data frame that give different results (although they contain the same number of elements):

 structure(list(data1.LOC = c("LL_A1_00000003068_686", "LL_A1_00000003538_274"), REF = c("G", "T"), ALT = c("C", "C"), L47.variant = c("0/1:28,34:62:99:1154,0,926", "0/0:9,0:9:21:0,21,276"), L51.variant = c("0/0:61,0:61:99:0,184,2417", "0/0:6,0:6:15:0,15,192"), LCro11.variant = c("0/0:24,0:24:72:0,72,951", "0/0:2,0:2:6:0,6,80"), LCro5.variant = c("0/0:48,0:48:99:0,141,1869", "0/0:5,0:5:15:0,15,173"), N01.variant = c("0/1:22,16:38:99:526,0,758", "1/1:0,2:2:6:63,6,0"), N09.variant = c("1/1:1,50:51:99:1885,110,0", "0/0:12,0:12:36:0,36,460"), Nor28.variant = c("1/1:0,23:23:66:874,66,0", "0/0:5,0:5:12:0,12,159"), P161.variant = c("1/1:0,54:55:99:2118,163,0", "0/0:2,0:2:6:0,6,80"), Rom155.variant = c("0/0:69,0:69:99:0,208,2749", "0/1:5,3:8:99:102,0,102"), Rom161.variant = c("0/0:75,0:75:99:0,226,2957", "0/0:5,0:5:15:0,15,196"), Rom303.variant = c("0/0:44,0:44:99:0,132,1739", "0/0:5,0:5:15:0,15,195"), Rus291.variant = c("0/1:43,30:73:99:972,0,1443", "0/1:1,3:4:28:108,0,28"), Rus292.variant = c("0/0:56,0:56:99:0,163,2139", "0/0:11,0:11:33:0,33,429"), Sl5t.variant = c("0/1:55,34:89:99:1003,0,1911", "0/0:10,0:10:30:0,30,379"), Sl6t.variant = c("0/0:89,0:89:99:0,268,3513", "0/0:10,0:10:30:0,30,383"), s037y.variant = c("0/0:63,0:63:99:0,190,2484", "0/0:8,0:8:18:0,18,236"), s087y.variant = c("0/0:72,0:72:99:0,211,2770", "0/0:6,0:6:15:0,15,179"), s2E03.variant = c("0/1:34,27:61:99:810,0,1175", "0/0:4,0:4:12:0,12,143"), s2L05.variant = c("0/0:56,0:56:99:0,169,2220", "0/1:4,4:8:95:139,0,95"), s2P01.variant = c("0/1:44,27:71:99:859,0,1519", "0/0:6,0:6:18:0,18,240"), s2R01.variant = c("1/1:0,68:68:99:2642,202,0", "0/1:5,6:11:99:202,0,130"), s2R05.variant = c("0/1:41,33:74:99:1012,0,1393", "0/0:8,0:8:24:0,24,312")), .Names = c("data1.LOC", "REF", "ALT", "L47.variant", "L51.variant", "LCro11.variant", "LCro5.variant", "N01.variant", "N09.variant", "Nor28.variant", "P161.variant", "Rom155.variant", "Rom161.variant", "Rom303.variant", "Rus291.variant", "Rus292.variant", "Sl5t.variant", "Sl6t.variant", "s037y.variant", "s087y.variant", "s2E03.variant", "s2L05.variant", "s2P01.variant", "s2R01.variant", "s2R05.variant"), row.names = 19:20, class = "data.frame")
dwf
  • 33
  • 4
  • 1
    Generally, you want to assign to a list like `mylist[[i]] <- i` Not sure if that'll resolve this, though. Also, while it's good that you simplified your problem, it is also important in cases like this to provide a reproducible example (with data as well): http://stackoverflow.com/a/28481250/1191259 – Frank Oct 05 '15 at 15:43
  • thank you for fast response! I have added two rows of the data frame. saddly, your suggestion did not help (but thanks!) – dwf Oct 05 '15 at 15:50
  • 4
    Please share your data with `dput()`, getting the proper underlying structure is probably important on this. – Gregor Thomas Oct 05 '15 at 15:55
  • i am a bit confused about the `mylist[[i]] <- i` bit. according to your sample columns not all entries in a row would be integers so it won't be possible to use them as indices in `mylist`. – stas g Oct 05 '15 at 16:00
  • 1
    @stasg You can use character indices too. – Frank Oct 05 '15 at 16:03
  • 4
    You may have duplicates in `vec`. If for instance, `vec<-c(1,1,2)`, `length(vec)==3` but it will result that `length(mylist)==2`. – nicola Oct 05 '15 at 16:04
  • have added data using `dput()` .. sry, didnt klnow about this ability – dwf Oct 05 '15 at 16:07
  • 1
    Confirmed my hypothesis. If `x` is the object you provided, try `y<-as.matrix(x);length(y[1,]);length(y[2,])`. – nicola Oct 05 '15 at 16:07
  • 1
    @nicola is right. try `duplicated(as.character(x[1, ]))` and `duplicated(as.character(x[2, ]))`. you will see that the second row has two identical columns (there is one 'TRUE' value). – stas g Oct 05 '15 at 16:25
  • @ nicola could you suggest a way for me to modify the script, if it is due to duplication? – dwf Oct 05 '15 at 16:25
  • I think the first step would be to convert your data into a usable format. The huge number of columns looks very unwieldy. I'd suggest you have a look at `library(data.table); res <- melt(setDT(x), measure.vars=patterns("variant"))` – Frank Oct 05 '15 at 16:26

1 Answers1

2

instead of using elements of vec to access elements of mylist, thus updating the same element in case of duplicates in vec, you should iterate through vec one by one via a standard integer index i running the length of it, like so:

foo <- apply(data, 1, function(x) {

     vec <- x
     mylist <- list()

     for (i in seq(vec)){
        #print(i)
         mylist[[i]] <- vec[i]
}
     print(length(vec))
     print(length(mylist))
})

To summarise, your code did not work because: You may have duplicates in vec. If for instance, vec<-c(1,1,2), length(vec)==3 but it will result that length(mylist)==2. # Comment from nicola 1 hour ago

stas g
  • 1,503
  • 2
  • 10
  • 20
  • thanks a bunch!! this has been driving me crazy. I implemented your suggestion for the large input file, and it worked a charm – dwf Oct 05 '15 at 16:38
  • Words explaining the change and why the OP's approach didn't work as expected would be good... – Frank Oct 05 '15 at 17:08
  • 2
    @nicola's comment above is the explanation. – IRTFM Oct 05 '15 at 17:52