0

I‘d like to do the following to dataset of structure:

ID | a1 | a2 | … | a15 | b1 | … | b15 | &more

I’d like to paste a value contained in columns a1-a15,

IF at least one column b1-b15 contains a certain value e.g. 1000.

So, if 1000 appears in column b3, I want to extract the value contained in a3 and put it into my newly created column (x). If it appears in b7, i want the value of a7 in x.

Thank you for your help!!

I tried if conditions, i in 1:15, grepl, ... but honestly i don't really know where to start...

r2evans
  • 141,215
  • 6
  • 77
  • 149
EcoHelp
  • 3
  • 1
  • 1
    Welcome to SO, EcoHelp! You say "paste" but each of your examples (`b3` and `b7`) are singletons. What if you have values in _both_ `b3` and `b7`? It's likely this would be obvious with sample data and your expected output. See https://stackoverflow.com/q/5963269 , [mcve], and https://stackoverflow.com/tags/r/info for guidance on filling out a question with sample data, code attempted, and expected output. Thanks! – r2evans Aug 22 '23 at 20:45

2 Answers2

1

Some fake data:

data.frame(ID = 1:5,
           a1 = 0:4,
           a2 = 11:15,
           b1 = 1000:1004,
           b2 = 996:1000) -> df

Here's a tidyverse solution, where I reshape the data long to more easily relate each a to its respective b, then extract the b==1000 cases, and join those to the original data.

I'm not sure what you want to happen if there were ever more than one match in a row.

library(tidyverse)
df |>
  left_join(
    df |> 
      pivot_longer(-ID, names_to = c(".value", "type"),names_pattern = "(.)(.)") |>
      filter(b == 1000) |>
      select(ID, x = a)
  )

Result

Joining with `by = join_by(ID)`
  ID a1 a2   b1   b2  x
1  1  0 11 1000  996  0
2  2  1 12 1001  997 NA
3  3  2 13 1002  998 NA
4  4  3 14 1003  999 NA
5  5  4 15 1004 1000 15
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
0

Using JonSpring's sample data, here's a method that does not rely on pivoting.

library(dplyr)
findB <- 1000
df %>%
  mutate(
    x = apply(
      mapply(function(a, b) if_else(b == findB, a, a[NA]), 
             pick(starts_with("a")), pick(starts_with("b"))),
      1, function(z) na.omit(z)[1])
  )
#   ID a1 a2   b1   b2  x
# 1  1  0 11 1000  996  0
# 2  2  1 12 1001  997 NA
# 3  3  2 13 1002  998 NA
# 4  4  3 14 1003  999 NA
# 5  5  4 15 1004 1000 15

This can be done in base R without difficulty,

findB <- 1000
df$x <- apply(
  mapply(function(a, b) if_else(b == findB, a, a[NA]),
         subset(df, select = startsWith(names(df), "a")), subset(df, select = startsWith(names(df), "b"))),
  1, function(z) na.omit(z)[1]
)

The use of a[NA] is because there are over six "types" of NA, and it is generally better to be type-safe. For instance, while class(a) here (inside the function) is "integer", if we used just class(NA) then we get "logical", any some type-safe functions tend to complain/fail with this. The use of a[NA] in there will be a vector of NA_integer_ (one of the six) the same length as the original a.

r2evans
  • 141,215
  • 6
  • 77
  • 149