3

I have a list containing multiple data.table and all of them might have different columns. Some of the tables inside are containing unwanted columns which I want to get rid of them. Suppose they're called "zRemoveThis1", "zRemoveThis2", "zRemoveThis3" etc. Here's a sample of my data.

library(data.table)

dt_list <- list(
  item_1 = data.table(ID = paste(1:3, "item_1", sep = "_"),
                      Count_A = c(11:13),
                      Count_B = c(14:16),
                      zRemoveThis1 = c(14:16),
                      count_C = c(17:19),
                      zRemoveThis2 = c(24:26)),
  item_2 = data.table(ID = paste(1:3, "item_2", sep = "_"),
                      Count_A = c(1:3),
                      Count_B = c(4:6),
                      count_C = c(7:9))
)

I already followed this post, but then I encountered a new problem. When I applied the patterns() with lapply to my list, it didn't work.

lapply(dt_list, function(x) { x[, .SD, .SDcols = ! patterns("zRemoveThis*")] })
#> Error in do_patterns(colsub, names_x): Pattern not found: [zRemoveThis*]

But when I applied the function individually, it worked on the first item of the list, but not on the second one.

#WORK
dt_list$item_1[, .SD, .SDcols = ! patterns("zRemoveThis*")]
#>          ID Count_A Count_B count_C
#> 1: 1_item_1      11      14      17
#> 2: 2_item_1      12      15      18
#> 3: 3_item_1      13      16      19

#DIDN'T WORK
dt_list$item_2[, .SD, .SDcols = ! patterns("zRemoveThis*")]
#> Error in do_patterns(colsub, names_x): Pattern not found: [zRemoveThis*]

I found out that the problem is the function won't work if there is no matching pattern. So I have an idea to this ugly if-else solution, yet it worked.

lapply(dt_list, function(x) {
  if (any(grepl("zRemoveThis", colnames(x)))) {
    return(x[, .SD, .SDcols = ! patterns("zRemoveThis*")])
  } else return(x)
})
#> $item_1
#>          ID Count_A Count_B count_C
#> 1: 1_item_1      11      14      17
#> 2: 2_item_1      12      15      18
#> 3: 3_item_1      13      16      19
#> 
#> $item_2
#>          ID Count_A Count_B count_C
#> 1: 1_item_2       1       4       7
#> 2: 2_item_2       2       5       8
#> 3: 3_item_2       3       6       9

My question is, is there any sophisticated data.table solution to my problem? Any help will be appreciated. Thanks in advance!

rifset
  • 203
  • 1
  • 9

2 Answers2

4

You may use grepl -

lapply(dt_list, function(x) {
  cols <- !grepl('zRemoveThis', names(x))
  x[, ..cols]
})

#$item_1
#         ID Count_A Count_B count_C
#1: 1_item_1      11      14      17
#2: 2_item_1      12      15      18
#3: 3_item_1      13      16      19

#$item_2
#         ID Count_A Count_B count_C
#1: 1_item_2       1       4       7
#2: 2_item_2       2       5       8
#3: 3_item_2       3       6       9
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

We can directly subset by specifying the column name pattern in .SDcols and return the .SD (Subset of data.table) as these are data.table objects after looping over the list with lapply

library(data.table)
lapply(dt_list, function(x) x[, .SD,
       .SDcols = -startsWith(names(x), 'zRemoveThis')])
$item_1
         ID Count_A Count_B count_C
1: 1_item_1      11      14      17
2: 2_item_1      12      15      18
3: 3_item_1      13      16      19

$item_2
         ID Count_A Count_B count_C
1: 1_item_2       1       4       7
2: 2_item_2       2       5       8
3: 3_item_2       3       6       9

Or use tidyverse, loop over the list with map, and select the columns that are not (-) having pattern that starts_with 'zRemoveThis'

library(dplyr)
library(purrr)
map(dt_list, ~ .x %>% 
          select(-starts_with('zRemoveThis')))
$item_1
         ID Count_A Count_B count_C
1: 1_item_1      11      14      17
2: 2_item_1      12      15      18
3: 3_item_1      13      16      19

$item_2
         ID Count_A Count_B count_C
1: 1_item_2       1       4       7
2: 2_item_2       2       5       8
3: 3_item_2       3       6       9
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you for the answer! I combined your idea with Ronak's idea and found the solution perfectly! Thank you! – rifset Aug 12 '21 at 01:25