1

I have multiple rasters in a R targets pipeline that I load with tar_files() and then iterate over it in the next target to add one column per file to a matrix. However, only the first column is created. Here is a reprex without using files:

library(targets)

tar_script(
  {
    add_column <- function(letter) {
      matrix(rep(letter, 10), ncol = 1)
    }
    list(
      tar_target(letters, letters),
      tar_target(
        added_columns,
        add_column(letters),
        pattern = map(letters)
      )
    )
  },
  ask = FALSE
)
tar_make()

How I can get a matrix with a column for each iteration?

When I load the result using tar_load(add_columns) it only has the first column. In the case with rasters, I used terra::extract to get one vector for each iteration, when I load the result, there are all columns but filled with NA except the first one.

Radim
  • 455
  • 2
  • 11
Josep Pueyo
  • 387
  • 2
  • 11

2 Answers2

2

I agree with @Radim, and you can produce their mat matrix by using nrow = 1 instead of ncol = 1 in matrix(), then transposing the result. Same solution, just a different way of thinking about it. iteration = "vector" row-binds data-frame-like objects, so the idea is to have each dynamic branch create one row instead of one column.

library(targets)
add_column <- function(letter) {
  matrix(rep(letter, 10), nrow = 1) # switch to nrow = 1
}
list(
  tar_target(letters, letters),
  tar_target(
    mat,
    add_column(letters),
    pattern = map(letters)
  ),
  tar_target(transposed, t(mat)) # transpose the output
)
Josep Pueyo
  • 387
  • 2
  • 11
landau
  • 5,636
  • 1
  • 22
  • 50
1
  1. It strikes me - because you are looking for a single output in the end, not for branching of your pipeline - that this is something that would be better achieved inside a target instead of using targets to do a programming job.

  2. That being said, would this be what you are looking for?

library(targets)
tar_script(
  {
    add_column <- function(letter) {
      matrix(rep(letter, 10), ncol = 1)
    }
    list(
      tar_target(
        name = letters, 
        command = letters
        ),
      tar_target(
        name = added_columns,
        command = add_column(letters),
        pattern = map(letters)
      ),
      tar_target(
        name = mat,
        command = matrix(added_columns, ncol = 10, byrow = TRUE),
      )
    )
  },
  ask = FALSE
)
tar_make()
tar_load(mat)

With the default settings, tar_target() uses iteration = "vector", so the target that uses pattern = map() will return a named vector.

Radim
  • 455
  • 2
  • 11
  • Thanks! I see, branching is to branch the pipeline, not to perform multiple operations based on the output of another target. I will do using a function that receives the output of `tar_files()`. Just one question, targets is already using parallel processing, so if I use furrr::map to iterate over elements, do I need to use `plan("multisession")` to allow parallel processing? Will this work? – Josep Pueyo Aug 29 '23 at 19:43
  • 1
    Taking the output of `tar_files()` as an input to another target, without any iteration in the pipeline, is also how I would go about it. For the second part, frankly, I am not sure if the parallel processing setup of the `targets` pipeline gets picked up automatically by `furrr`. I always make a separate setup for `furrr` inside the target function. Not sure if this is the correct/best way to go about it. But it works. I reckon the `furrr` plan could also be defined as a targets global. – Radim Aug 29 '23 at 20:18
  • 1
    This is an older thread for within-target parallelism in `drake`. Still a good reference, I think: https://github.com/ropensci/drake/issues/675#issuecomment-458222414 – Radim Aug 29 '23 at 20:32