1

I'm dealing with a data set of around 230k rows and 11 columns, and need to create a set of matrices (or a 3D array). The element of a given matrix is the sum of a certain column (let's say COST) of the data set when filtered on 3 criteria (let's say NAME, DATE, and NUMBER). Here's how I currently have this set up.

for(k in 1:length(names)) {
    x <- matrix(0, nrow = length(dates), ncol = 20)
    for(i in 1:length(dates)) {
        for(j in 1:20) {
            x[i, j] <- sum(claims[as.logical((data$NAME == names[k]) * (data$DATE == dates[i]) * (data$NUMBER == j)), 4])
        }
    }
assign(names[k], x)
}

Is there a way to accomplish this without using loops or if not, at least more efficiently?

Thanks!

Edit: Here's some test data I made up as requested. I only included columns that were necessary for this part of the code. I also decreased the number of columns for the matrices from 40 to 20 in the code above for testing purposes. dates and names are vectors needed for the for loops. test_input is the test data. test_output_market1 is the first matrix that would be created in the loops.

names:

c("Market1"  "Market2"   "Market3"   "Market4"   "Market5")

dates:

structure(c(18993   18994   18995   18996   18997   18998   18999       
19000   19001   19002   19003   19004   19005   19006   19007   19008
19009   19010   19011   19012   19013   19014   19015   19016   19017
19018   19019   19020   19021   19022    19023)  class = "Date")

    

test_input:

structure(list(DATE = structure(c(19021 19007   18994   19022                           
19010   19007   19007   18996   19012    19007)  class = "Date")     NUMBER = c(4           
2   8   11  5   5   7   12  15   5)  NAME = c("Market1"  "Market3"
Market1  "Market1"   "Market5"   "Market4"   "Market2"   "Market3"                  
Market4  "Market5")  COST = c(9384.3    93.23   6354.12 346.98                  
5743.48 845.73  3948.37 264.34  38790.24     8723.2))    class = "data.frame"    row.names = c(NA           
-10L))  

                                

test_output_market1:

structure(c(0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0               
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   9384.3
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   6354.12 0   0   0   0   0   0       
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   346.98  0   0   0   0   0   0   0   0   0       
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0    0)  .Dim = c(31L    
20L))   

                                                                        
ChrisM
  • 11
  • 3
  • 2
    Yes, this can be done much more quickly. Please share a few rows of sample input and desired output so we can test code before posting it. – Gregor Thomas Feb 19 '22 at 17:32
  • The easiest way to share sample data is with `dput()`, e.g., `dput(data[1:10, ])` for the first 10 rows. Choose a suitable small subset to illustrate the problem. The result will be a copy/pasteable version of your data, including all class and structure information. – Gregor Thomas Feb 19 '22 at 17:33
  • Thanks, I added some sample data. Let me know if you need anything else. – ChrisM Feb 19 '22 at 19:46

1 Answers1

0

I need to point out is that I am not sure what you were doing with the structure() elements you provided. They are not the output of dput(). For example, there would be commas between every element. Did you manually create these or modify them? In the future, check out this resource: making R reproducible questions.

One more thing...you used the object names names and dates. Both of these are keywords. I have changed the objects to nm and dts. Here are the objects again--as they would look from the output of dput with the names I've used. I recreated what would have been names, instead of fixing it. (It's not the output of dput.)

dts <- structure(c(18993, 18994, 18995, 18996, 18997, 18998, 18999, 19000, 19001, 
                   19002, 19003, 19004, 19005, 19006, 19007, 19008, 19009, 19010, 
                   19011, 19012, 19013, 19014, 19015, 19016, 19017, 19018, 19019, 
                   19020, 19021, 19022, 19023),  class = "Date")

test_input <- structure(list(
  DATE = structure(c(19021, 19007, 18994, 19022, 19010, 19007, 19007, 18996, 19012,  
                     19007),  class = "Date"),
  NUMBER = c(4, 2, 8, 11, 5, 5, 7, 12, 15, 5),
  NAME = c("Market1", "Market3", "Market1", "Market1", "Market5",
           "Market4", "Market2", "Market3", "Market4", "Market5"),
  COST = c(9384.3, 93.23, 6354.12, 346.98, 5743.48, 845.73, 
           3948.37, 264.34, 38790.24, 8723.2)), 
  row.names = c(NA, -10L),
  class = "data.frame")



# create name vector
nm <- sort(unique(test_input$NAME))

After creating usable objects, I created an empty list to store each of the matrices that will come from the loops.

library(tidyverse) # really just for %>%

# create a list to store each of the matrices
v <- vector(mode = "list",
            length = nm) # length unique NAME

for(k in 1:length(nm)){.  # for each unique NAME
  x <- matrix(0, 
              nrow = length(dts),  # creating the empty matrix
              ncol = 20)
  for(i in 1:length(dts)){         # for each DATE in vector
    for(j in 1:20){                # for each NUMBER in vector
      x[i, j] <- test_input[which(
        test_input$NAME == nm[k] &     # NAME == name
          test_input$DATE == dts[i] &  # DATE == date
          test_input$NUMBER == j), "COST"] %>% # NUMBER = number and select cost
        unlist(use.names = F) %>%      # make any values obtained a vector
        {sum(.)}                   # add the vector of values
    }
  }
  v[[k]] <- x                     # add the matrix to the list of matrices
}

If you execute v[[1]] you'll see the output for the first value in nm, which is "Market1."

#       [,1] [,2] [,3]   [,4] [,5] [,6] [,7]    [,8] [,9] [,10]  [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
#  [1,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
#  [2,]    0    0    0    0.0    0    0    0 6354.12    0     0   0.00     0     0     0     0     0     0     0     0     0
#  [3,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
#  [4,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
#  [5,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
#  [6,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
#  [7,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
#  [8,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
#  [9,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [10,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [11,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [12,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [13,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [14,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [15,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [16,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [17,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [18,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [19,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [20,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [21,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [22,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [23,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [24,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [25,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [26,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [27,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [28,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [29,]    0    0    0 9384.3    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0
# [30,]    0    0    0    0.0    0    0    0    0.00    0     0 346.98     0     0     0     0     0     0     0     0     0
# [31,]    0    0    0    0.0    0    0    0    0.00    0     0   0.00     0     0     0     0     0     0     0     0     0 

This matches your data in test_output_market1.

all.equal(v3[[1]], test_output_market1)
# [1] TRUE 
Kat
  • 15,669
  • 3
  • 18
  • 51