1

I already asked a similar question, however the input data has different dimension and I don't get the bigger array filled with the smaller matrix or array. Here some basic example data showing my structure:

dfList <- list(data.frame(CNTRY = c("B", "C", "D"), Value=c(3,1,4)),
               data.frame(CNTRY = c("A", "B", "E"),Value=c(3,5,15)))
names(dfList) <- c("111.2000", "112.2000")

The input data is a list of >1000 dfs. Which I turned into a list of matrices with the first column as rownames. Here:

dfMATRIX <- lapply(dfList, function(x) {
  m <- as.matrix(x[,-1])
  rownames(m) <- x[,1]
  colnames(m) <- "Value"
  m
})

This list of matrices I tried to filled in an array as shown in my former question. Here:

loadandinstall("abind")
CNTRY <- c("A", "B", "C", "D", "E")
full_dflist <- array(dim=c(length(CNTRY),1,length(dfMATRIX)))
dimnames(full_dflist) <- list(CNTRY, "Value", names(dfMATRIX))

for(i in seq_along(dfMATRIX)){
  afill(full_dflist[, , i], local= TRUE ) <- dfMATRIX[[i]]   
}

which gives the error message:

Error in `afill<-.default`(`*tmp*`, local = TRUE, value = c(3, 1, 4)) : 
  does not make sense to have more dims in value than x

Any ideas? I also tried as in my former question to use acast and also array() instead of the dfMATRIX <- lapply... command. I would assume that the 2nd dimension of my full_dflist-array (sorry for the naming:)) is wrong, but I don't know how to write the input. I appreciate your ideas very much.

Edit2: Sorry I put the wrong output:) Here my new expected output:

$`111.2000`
  Value
A    NA
B     3
C     1
D     4
E    NA

$`112.2000`
  Value
A     3
B     5
C    NA
D    NA
E    15
Community
  • 1
  • 1
N.Varela
  • 910
  • 1
  • 11
  • 25

1 Answers1

2

This could be one solution using data.table:

library(data.table)
#create a big data.table with all the elements
biglist <- rbindlist(dfList)
#use lapply to operate on individual dfs
lapply(dfList, function(x) {
  #use the big data table to merge to each one of the element dfs
  temp <- merge(biglist[, list(CNTRY)], x, by='CNTRY', all.x=TRUE)
  #remove the duplicate values
  temp <- temp[!duplicated(temp), ] 
  #convert CNTRY to character and set the order on it
  temp[, CNTRY := as.character(CNTRY)]
  setorder(temp, 'CNTRY')
  temp
  })

Output:

$`111.2000`
   CNTRY Value
1:     A    NA
2:     B     3
3:     C     1
4:     D     4
5:     E    NA

$`112.2000`
   CNTRY Value
1:     A     3
2:     B     5
3:     C    NA
4:     D    NA
5:     E    15

EDIT

For your updated output you could do:

lapply(dfList, function(x) {
  temp <- merge(biglist[, list(CNTRY)], x, by='CNTRY', all.x=TRUE)
  temp <- temp[!duplicated(temp), ] 
  temp[, CNTRY := as.character(CNTRY)]
  setorder(temp, 'CNTRY')
  data.frame(Value=temp$Value, row.names=temp$CNTRY)
  })

$`111.2000`
  Value
A    NA
B     3
C     1
D     4
E    NA

$`112.2000`
  Value
A     3
B     5
C    NA
D    NA
E    15

But I would really suggest keeping the list with data.table elements rather than converting to data.frames so that you can have row.names.

LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • Maybe just a small more question, how can I get instead a `list of dfs` a `list of vectors` (as if the rownames are ordered as in the output, I don't really need them)? Thanks a lot – N.Varela Oct 28 '15 at 12:30
  • 1
    You could replace the last line in `lapply` from `data.frame(Value=temp$Value, row.names=temp$CNTRY)` to `temp[, Value]`. And it will only be a vector. – LyzandeR Oct 28 '15 at 12:32