0

I have a list of about 561 elements, each of which is a list that looks like a matrix when called. Below is an example from the dataset,

structure(list(`111110` = structure(c(205, 4, 1, 6, 23, 0, 1, 
0, 0), .Dim = c(3L, 3L), .Dimnames = list(c("1", "4", "5"), c("1", 
"4", "5"))), `111120` = structure(c(181, 3, 4, 4), .Dim = c(2L, 
2L), .Dimnames = list(c("1", "4"), c("1", "4"))), `111130` = structure(c(71, 8, 3, 15, 114, 7, 6, 8, 56), .Dim = c(3L, 3L), .Dimnames = list(
c("1", "4", "5"), c("1", "4", "5"))), `111140` = structure(c(87, 
8, 9, 14), .Dim = c(2L, 2L), .Dimnames = list(c("1", "4"), c("1", 
"4"))), `111150` = structure(24, .Dim = c(1L, 1L), .Dimnames = list(
"1", "1")), `111160` = structure(48, .Dim = c(1L, 1L), .Dimnames = list(
"1", "1"))), .Names = c("111110", "111120", "111130", "111140", 
"111150", "111160"))

The dimensions of each element in the list are 1 x 1 to 6 x 6. I would like to do the following calculations for each of the elements in the list:

  1. if the entry has a column named "5", then I would like to sum the entries in column "5", except the entry in the last row of column "5". If there is no column "5" then the calculation should be blank.

  2. if the entry has a column named "5", sum elements in column "1", except the first element. If the associated entry does not have a column with "5" as its header it should be blank.

  3. take the calculations in part 1 and 2 and add them to a data frame containing the unique id and the calculations from 1 and 2.

I have tried the following (based on the answer provided below):

output <- c()
for(x in names(trans.by.naics)) {
  id <- x
  count.entry.5 <- ifelse("5" %in% colnames(trans.by.naics[[x]]),
                            sum(trans.by.naics[[x]][1 :nrow(trans.by.naics[[x]]), 5]) - trans.by.naics[[x]][5,5], "") # sum down the first four rows of column "5" if it exists
  count.entry.1 <- ifelse("5" %in% colnames(trans.by.naics[[x]]),
                     sum(trans.by.naics[[x]][1 : nrow(trans.by.naics[[x]]), 1]) - trans.by.naics[[x]][1,1], "") 
  thing <- data.frame(id, count.entry.5, count.entry.1)
  output <- rbind(output, thing)

}

But I get the following when I run my code:

Error in trans.by.naics[[x]][1:nrow(trans.by.naics[[x]]), 5] : 
  subscript out of bounds

The desired output looks like this:

      id count.entry.5 count.entry.1
1 111110             1             5
2 111120                           3
3 111130            14            11
4 111140                            
5 111150                            
6 111160

Is there a good way to do this that won't take too long? Perhaps a more vectorized approach? An lapply approach? Any advice or help is appreciated. Thanks!!

jvalenti
  • 604
  • 1
  • 9
  • 31

1 Answers1

1
output <- c()
for (x in names(data)) {
  id <- x
  if(sum(colnames(data[[x]]) %in% "5") == 1) {
    calc1 <- sum(data[[x]][-nrow(data[[x]]), "5"])
    calc2 <- sum(data[[x]][-1, "1"])
  } else {
    calc1 <- NA
    calc2 <- NA
  }
  thing <- data.frame(id, calc1, calc2)
  output <- rbind(output, thing)
}
AidanGawronski
  • 2,055
  • 1
  • 14
  • 24
  • the second element of your data doesn't have a column "5" ... so the output is not quite identical, but it does what you ask. – AidanGawronski May 31 '17 at 21:40
  • I don't quite understand the line...`if(sum(colnames(data[[x]]) %in% "5") == 1` line. Why sum the column names? – jvalenti Jun 01 '17 at 17:23
  • I ask because the calculations for calc1 are incorrect in the case of a 6 X 6, even if it does have a "5" in `colnames[[data]]`--it adds the final row of column "5" – jvalenti Jun 01 '17 at 17:32
  • 1
    The sum of several FALSE is 0, but if any one of them is 5 then the TRUE will sum to 1. Add the 6 x 6 with your desired output. – AidanGawronski Jun 01 '17 at 17:37
  • ah, I see. Clever and clean. I accounted for the 6 X 6 case: `calc1 <- sum(trans.by.naics[[x]][, "5"]) - trans.by.naics[[x]]["5", "5"]` – jvalenti Jun 01 '17 at 18:06
  • The 6 x 6 case should have worked without changing anything. – AidanGawronski Jun 01 '17 at 18:15