0

I am trying to use two nested for loops in R to produce temporary subsets of rows and columns of an overall data frame for generation of figures. The index variable for the first loop is passed to i in myDT[i, j, by], and I have had no issue with subsetting rows. However, I've tried many ways of passing the index variable of the inner loop to the j position and have been met with a variety of errors and unexpected results. Note that each row in full_dt represents a single data point (several numeric results of digital analysis of a single image), that >1 species (full_dt$sp) are included, that each section code (full_dt$sect) is unique, and that aoi codes (full_dt$aoi) are repeated.

> full_dt <- fread(".../full_dt.csv")
> head(full_dt)
   V1      sp                                              sect  aoi  aoi_area n_xyl       mhwd   num_den ageClass
1:  1 cel.pal seed.cel.pal_indiv2_stem1_picture2_100x_2048x1536 aoi2 1.3964749    14 0.01538392 18.050659 Seedling
2:  2 cel.pal seed.cel.pal_indiv2_stem1_picture2_100x_2048x1536 aoi3 1.5587317    56 0.01667791 47.994443 Seedling
3:  3 cel.pal seed.cel.pal_indiv2_stem1_picture2_100x_2048x1536 aoi4 1.2133989    31 0.01551492 34.804520 Seedling
4:  4 cel.pal seed.cel.pal_indiv3_stem1_picture4_100x_2048x1536 aoi2 0.7356047    17 0.01449645 31.732125 Seedling
5:  5 cel.pal seed.cel.pal_indiv3_stem1_picture4_100x_2048x1536 aoi3 0.9252753     9 0.01550191 17.089949 Seedling
6:  6 cel.pal seed.cel.pal_indiv3_stem1_picture4_100x_2048x1536 aoi4 0.7325242     4 0.01672792  8.225981 Seedling


> age_classes <- as.vector(unique(full_dt$ageClass))
> age_classes
[1] "Seedling" "Mature"  
> data_types  <- as.vector(colnames(full_dt[,6:8]),)
> data_types
[1] "n_xyl"   "mhwd"    "num_den"


for (k in age_classes){
 for (l in data_types) {

  data_bp <- full_dt[ageClass == k, ..l,  by=.(sp,sect,aoi)]
  #ggplot() + geom_boxplot(data = data_bp, mapping = aes(x=data_bp$sp,y=data_bp$mhwd))
  #ggsave(...)

 }
}  

My goal for each iteration of the inner loop is to pass each object in vector data_types to the j in full_dt[i ,j, by] to produce a smaller data table containing columns sp, sect, aoi, and l and rows where ageClass == k. I have been able to use l defined as data_types[1] to subset full_dt when i and and by are left empty, but not when when i and by are defined (as above).

Thank you all.

overcup
  • 117
  • 7
  • In the code you provided, you commented out the `ggplot()` calls, but within them you use `y = data_bp$mhwd`. Is that possibly your mistake or is it just an example for us? – kybazzi Dec 20 '21 at 05:10
  • Please, define a [minimal example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for us to work on. – Francesco Grossetti Dec 20 '21 at 10:55
  • The ggplot() calls are included as an example of the use to which the data table will be put, but are commented out because they're not directly relevant. – overcup Dec 20 '21 at 17:46
  • @FrancescoGrossetti, I'm unsure of how to add a more specific problem. I am trying to pass index variable l in index vector data_types to the the j position in myDT[i, j, by], but have not been able to do so successfully. What else would you recommend that I add? – overcup Dec 20 '21 at 17:48

1 Answers1

0

Code should read as follows:

data_bp <- full_dt[ageClass == k, .SD, .SDcols = l, by=.(sp,sect,aoi)]

Addition of .SD, .SDcols = l moves values in full_dt$mhwd to data_bp$mhwd.

Peter Csala
  • 17,736
  • 16
  • 35
  • 75
overcup
  • 117
  • 7