0

Aim to extract a column along with the column to the right of it from one dataframe that == a condition in another dataframe

Approach: For every every row an anonymous function selects the appropriate columns of df23 using the match argument that == the value in dfa$result (acknowledgment@jay.sf)

Issue: When dfa$result is not a unique value e.g. Noradrenaline.stop.time.variation i.e. it appears more than once - but only once for each row the following error occurs

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 1, 0'' when df$name does NOT contain a unique value

### dictionary

a <- c(Noradrenaline.start.time.variation = 'true_noradrenaline_on',
       Noradrenaline.stop.time.variation = 'true_noradrenaline_off',
       Vasopressin.stop.time.variation = 'true_vasopressin_off')                             
 
### Code works when their is a unique value in dfa$result ###

subset(dfa, result %in% c('Noradrenaline.start.time.variation')) %>% apply(MARGIN=1, \(x) {
mt <- match(a[match(x[2], names(a))], names(df23))
cbind(t(x[1:2]), df23[df23$study_id == x[1], c(mt, mt + 1)]) %>%
setNames(c('study_id', 'Conflict', 'initial', 'verification'))
}) %>% do.call(what=rbind)

#### When dfa$result does NOT contain a unique value the following error occurs'''Error in data.frame(..., 
####check.names = FALSE) : arguments imply differing number of rows: 1, 0'' when df$name does NOT 
####contain a unique value###
  
subset(dfa, result %in% c('Noradrenaline.stop.time.variation')) %>% apply(MARGIN=1, \(x) {
mt <- match(a[match(x[2], names(a))], names(df23))
cbind(t(x[1:2]), df23[df23$study_id == x[1], c(mt, mt + 1)]) %>%
setNames(c('study_id', 'Conflict', 'initial', 'verification'))
}) %>% do.call(what=rbind)


dfa <- data.frame(
  study_id = c(93, 71, 75, 109, 109),
  result = c(
    "Noradrenaline.start.time.variation", "Noradrenaline.stop.time.variation",
    "Noradrenaline.stop.time.variation", "Noradrenaline.stop.time.variation",
    "Vasopressin.stop.time.variation"
  ),
  value = c(FALSE, FALSE, FALSE, FALSE, FALSE),
  expression = c(
    "(norad_start_diff > -1 & norad_start_diff < 1)", "(norad_stop_diff > -1 & norad_stop_diff < 1)",
    "(norad_stop_diff > -1 & norad_stop_diff < 1)", "(norad_stop_diff > -1 & norad_stop_diff < 1)",
    "(vaso_stop_diff > -1 & vaso_stop_diff < 1)"
  ))

df23 <- data.frame(
  study_id = c(
    6, 8, 12, 18, 25, 26, 38, 40, 42, 47, 50, 51, 65, 69, 71, 72, 74, 75, 80, 81,
    85, 86, 87, 88, 92, 93, 98, 99, 101, 106, 109, 114, 121, 122, 125
  ),
  true_adrenaline_on = as.POSIXct(
    c(
      "2020-09-17 20:58:00", NA, NA, "2020-10-12 16:18:00", NA, NA,
      "2020-11-30 18:34:00", NA, NA, NA, "2020-12-17 17:07:00", NA,
      NA, NA, NA, NA, NA, NA, NA, "2021-01-29 16:19:00", NA, NA, NA,
      NA, NA, NA, "2021-02-17 14:28:00", "2021-02-18 15:00:00", NA,
      NA, NA, NA, NA, NA, NA
    ),
    tz = "UTC"
  ),
  verification_true_adren_on = as.POSIXct(
    c(
      "2020-09-17 20:58:00", NA, NA, "2020-10-12 16:18:00", NA, NA,
      "2020-11-30 18:34:00", NA, NA, NA, "2020-12-17 17:07:00", NA,
      NA, NA, NA, NA, NA, NA, NA, "2021-01-29 16:19:00", NA, NA, NA,
      NA, NA, NA, "2021-02-17 14:28:00", "2021-02-18 14:52:00", NA,
      NA, NA, NA, NA, NA, NA
    ),
    tz = "UTC"
  ),
  true_noradrenaline_on = as.POSIXct(
    c(
      "2020-09-17 23:00:00", "2020-09-21 14:10:00", "2020-10-01 16:03:00",
      "2020-10-12 16:18:00", NA, "2020-11-05 14:15:00", "2020-11-30 18:34:00",
      "2020-12-01 22:21:00", "2020-12-07 14:00:00", "2020-12-15 01:00:00",
      "2020-12-17 17:07:00", "2020-12-17 23:08:00", "2021-01-11 22:27:00",
      NA, "2021-01-20 01:00:00", "2021-01-18 17:12:00", NA, "2021-01-21 19:00:00",
      "2021-01-28 16:50:00", "2021-01-29 16:19:00", "2021-02-03 21:00:00",
      "2021-02-04 18:00:00", "2021-02-05 00:48:00", "2021-02-05 14:03:00",
      "2021-02-11 13:35:00", "2021-02-12 04:00:00", "2021-02-17 14:28:00",
      "2021-02-18 14:52:00", NA, "2021-02-25 13:00:00", "2021-02-26 21:45:00",
      "2021-03-04 19:33:00", "2021-03-11 17:30:00", "2021-03-11 22:32:00",
      NA
    ),
    tz = "UTC"
  ),
  verification_true_norad_on = as.POSIXct(
    c(
      "2020-09-17 23:00:00", "2020-09-21 14:10:00", "2020-10-01 16:03:00",
      "2020-10-12 16:18:00", NA, "2020-11-05 14:15:00", "2020-11-30 18:34:00",
      "2020-12-01 22:21:00", "2020-12-07 14:00:00", "2020-12-15 01:00:00",
      "2020-12-17 17:07:00", "2020-12-17 23:08:00", "2021-01-11 22:27:00",
      NA, "2021-01-20 01:20:00", "2021-01-18 17:12:00", NA, "2021-01-21 19:20:00",
      "2021-01-28 16:50:00", "2021-01-29 16:19:00", "2021-02-03 21:00:00",
      "2021-02-04 18:00:00", "2021-02-05 00:48:00", "2021-02-05 14:03:00",
      "2021-02-11 13:35:00", "2021-02-12 05:05:00", "2021-02-17 14:28:00",
      "2021-02-18 14:52:00", NA, "2021-02-25 13:05:00", "2021-02-26 21:45:00",
      "2021-03-04 19:33:00", "2021-03-11 17:30:00", "2021-03-11 22:32:00",
      NA
    ),
    tz = "UTC"
  ),
  true_metaraminol_on = as.POSIXct(
    c(
      NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_
    ),
    tz = "UTC"
  ),
  verification_true_met_on = as.POSIXct(
    c(
      NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_
    ),
    tz = "UTC"
  ),
  true_phenylephrine_on = as.POSIXct(
    c(
      NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_
    ),
    tz = "UTC"
  ),
  verification_true_phen_on = as.POSIXct(
    c(
      NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_
    ),
    tz = "UTC"
  ),
  true_vasopressin_on = as.POSIXct(
    c(
      "2020-09-18 09:00:00", NA, NA, "2020-10-12 23:00:00", NA, NA,
      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
      NA, NA, NA, NA, NA, "2021-02-18 18:00:00", NA, NA, "2021-02-26 21:45:00",
      NA, NA, NA, NA
    ),
    tz = "UTC"
  ),
  verification_true_vaso_on = as.POSIXct(
    c(
      "2020-09-18 09:00:00", NA, NA, "2020-10-12 23:00:00", NA, NA,
      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
      NA, NA, NA, NA, NA, "2021-02-18 18:00:00", NA, NA, "2021-02-26 21:45:00",
      NA, NA, NA, NA
    ),
    tz = "UTC"
  ),
  true_adrenaline_off = as.POSIXct(
    c(
      "2020-09-18 09:00:00", NA, NA, "2020-10-13 13:00:00", NA, NA,
      "2020-12-03 08:00:00", NA, NA, NA, "2020-12-17 19:00:00", NA,
      NA, NA, NA, NA, NA, NA, NA, "2021-01-30 09:00:00", NA, NA, NA,
      NA, NA, NA, "2021-02-17 16:00:00", "2021-02-19 22:00:00", NA,
      NA, NA, NA, NA, NA, NA
    ),
    tz = "UTC"
  ),
  verification_true_adren_off = as.POSIXct(
    c(
      "2020-09-18 09:00:00", NA, NA, "2020-10-13 13:00:00", NA, NA,
      "2020-12-03 08:00:00", NA, NA, NA, "2020-12-17 19:00:00", NA,
      NA, NA, NA, NA, NA, NA, NA, "2021-01-30 09:00:00", NA, NA, NA,
      NA, NA, NA, "2021-02-17 16:00:00", "2021-02-19 22:00:00", NA,
      NA, NA, NA, NA, NA, NA
    ),
    tz = "UTC"
  ),
  true_noradrenaline_off = as.POSIXct(
    c(
      "2020-09-21 07:00:00", "2020-09-21 22:00:00", "2020-10-01 17:00:00",
      "2020-10-14 18:00:00", NA, "2020-11-06 18:00:00", "2020-12-02 23:00:00",
      "2020-12-02 11:00:00", "2020-12-07 18:00:00", "2020-12-15 17:00:00",
      "2020-12-19 09:00:00", "2020-12-20 00:00:00", "2021-01-12 01:00:00",
      NA, "2021-01-20 16:00:00", "2021-01-19 04:00:00", NA, "2021-01-22 05:00:00",
      "2021-01-29 05:00:00", "2021-01-30 05:00:00", "2021-02-04 05:00:00",
      "2021-02-05 09:00:00", "2021-02-05 22:00:00", "2021-02-08 07:00:00",
      "2021-02-12 08:00:00", "2021-02-12 09:00:00", "2021-02-18 04:00:00",
      "2021-02-25 06:52:00", NA, "2021-02-26 06:00:00", "2021-02-27 15:00:00",
      "2021-03-05 04:00:00", "2021-03-11 21:00:00", "2021-03-12 08:00:00",
      NA
    ),
    tz = "UTC"
  ),
  verification_true_norad_off = as.POSIXct(
    c(
      "2020-09-21 07:00:00", "2020-09-21 22:00:00", "2020-10-01 17:00:00",
      "2020-10-14 19:00:00", NA, "2020-11-06 18:00:00", "2020-12-02 23:00:00",
      "2020-12-02 11:00:00", "2020-12-07 18:00:00", "2020-12-15 17:00:00",
      "2020-12-19 09:00:00", "2020-12-20 00:00:00", "2021-01-12 01:12:00",
      NA, "2021-01-19 17:30:00", "2021-01-19 04:00:00", NA, "2021-01-22 12:50:00",
      "2021-01-29 05:00:00", "2021-01-30 05:00:00", "2021-02-04 05:00:00",
      "2021-02-05 09:00:00", "2021-02-05 22:10:00", "2021-02-08 07:02:00",
      "2021-02-12 08:00:00", "2021-02-12 09:05:00", "2021-02-18 04:00:00",
      "2100-01-01 00:00:00", NA, "2021-02-26 06:00:00", "2021-02-27 17:01:00",
      "2021-03-05 04:00:00", "2021-03-11 20:00:00", "2021-03-12 08:00:00",
      NA
    ),
    tz = "UTC"
  ),
  true_metaraminol_off = as.POSIXct(
    c(
      NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_
    ),
    tz = "UTC"
  ),
  verification_true_met_off = as.POSIXct(
    c(
      NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_
    ),
    tz = "UTC"
  ),
  true_phenylephrine_off = as.POSIXct(
    c(
      NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_
    ),
    tz = "UTC"
  ),
  verification_true_phen_off = as.POSIXct(
    c(
      NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
      NA_character_
    ),
    tz = "UTC"
  ),
  true_vasopressin_off = as.POSIXct(
    c(
      "2020-09-21 07:00:00", NA, NA, "2020-10-14 00:00:00", NA, NA,
      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
      NA, NA, NA, NA, NA, "2021-02-19 15:00:00", NA, NA, "2021-02-26 23:00:00",
      NA, NA, NA, NA
    ),
    tz = "UTC"
  ),
  verification_true_vaso_off = as.POSIXct(
    c(
      "2100-01-01 00:00:00", NA, NA, "2020-10-14 00:00:00", NA, NA,
      NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
      NA, NA, NA, NA, NA, "2021-02-19 15:00:00", NA, NA, "2021-02-27 00:10:00",
      NA, NA, NA, NA
    ),
    tz = "UTC")) 



##### WORKING SOLUTION :-) #########

df24 <- df23 %>% pivot_longer(cols=c(
  "true_adrenaline_on",
  "true_noradrenaline_on",
  "true_metaraminol_on",
  "true_phenylephrine_on",
  "true_vasopressin_on",
  "true_adrenaline_off",
  "true_noradrenaline_off",
  "true_metaraminol_off",
  "true_phenylephrine_off",
  "true_vasopressin_off",
  'verification_true_adren_on',
  'verification_true_norad_on',
  'verification_true_met_on',
  'verification_true_phen_on',
  'verification_true_vaso_on',
  'verification_true_adren_off',
  'verification_true_norad_off',
  'verification_true_met_off',
  'verification_true_phen_off',
  'verification_true_vaso_off'
  ),names_to='year', values_to='points') 


df24verificationsubset <- df24[-grep("verification", df24$year),]
dfhope <- df24verificationsubset %>% right_join(dfa, by=c("study_id"))
d <- dfhope %>% mutate(year = fct_recode(year,Adrenaline.start.time.variation ="true_adrenaline_on",
                                        Noradrenaline.start.time.variation = "true_noradrenaline_on",
                                        Metaraminol.start.time.variation = "true_metaraminol_on",
                                        Phenylephrine.start.time.variation = "true_phenylephrine_on",
                                        Vasopressin.start.time.variation = "true_vasopressin_on",
                                        Adrenaline.stop.time.variation = "true_adrenaline_off",
                                        Noradrenaline.stop.time.variation = "true_noradrenaline_off",
                                        Metaraminol.stop.time.variation = "true_metaraminol_off",
                                        Phenylephrine.stop.time.variation = "true_phenylephrine_off",
                                        Vasopressin.stop.time.variation = "true_vasopressin_off"))                                        

subset(d, year == result) |> gt()
d_df <-as.data.frame(d) %>% rename(primary = points)
d_df_subset <- subset(d_df, year == result)


df24verificationsubseta <- df24[grep("verification", df24$year),]
dfhopea <- df24verificationsubseta %>% right_join(dfa, by=c("study_id"))
da <- dfhopea %>% mutate(year = fct_recode(year,Adrenaline.start.time.variation ="verification_true_adren_on",
                                         Noradrenaline.start.time.variation = "verification_true_norad_on",
                                         Metaraminol.start.time.variation = "verification_true_met_on",
                                         Phenylephrine.start.time.variation = "verification_true_phen_on",
                                         Vasopressin.start.time.variation = "verification_true_vaso_on",
                                         Adrenaline.stop.time.variation = "verification_true_adren_off",
                                         Noradrenaline.stop.time.variation = "verification_true_norad_off",
                                         Metaraminol.stop.time.variation = "verification_true_met_off",
                                         Phenylephrine.stop.time.variation = "verification_true_phen_off",
                                         Vasopressin.stop.time.variation = "verification_true_vaso_off"))

da_df <-as.data.frame(da) %>% rename(verification = points)
da_df_subset <- subset(da_df, year == result)

merge(d_df_subset,da_df_subset) %>% 
  select(study_id, year, primary,verification) %>% 
  arrange(study_id) |> gt() 

GJW
  • 105
  • 6
  • Try sth like `vryn <- lapply(c('.*on', '.*off'), \(x) lapply(paste0('_', c('adren', 'norad', 'met', 'phen', 'vaso'), x), grep, names(df23), value=TRUE)) |> unlist(recursive=F)` in `reshape(..., varying=vryn)`, see the [dupe](https://stackoverflow.com/questions/75493596/copy-data-from-one-column-to-another-column-if-condition-in-another-column-is-me). – jay.sf Feb 23 '23 at 11:37
  • @jay.sf Thank you for your continued input. Sorry I don't quite understand the approach. Vryn creates a list which pairs columns of interest. `df25<- reshape(df23, dir='long', idvar='study_id', varying=vryn)` How would one go from dfa$result and for each row e.g. Adrenaline.stop.time.variation extract the columns that correspond to it i.e. vryn[[1]] from df25. – GJW Feb 24 '23 at 11:42
  • The target output would be Study ID | dfa$result | true_ | verification_ i.e. only 4 columns. As opposed to having new pairs of columns corresponding to 'adren', 'norad', 'met', 'phen', 'vaso' – GJW Feb 24 '23 at 11:50
  • 1
    What I meant is that you should familiarize yourself with the concept of reshaping data; the other approach gets way to complicated. I showed you a possible way of how that can be done with your data in your other question. Also see the second source I added as dupe here. – jay.sf Feb 24 '23 at 12:09
  • 1
    @jay.sf a `pivot_longer` and a few `subset` later a working solution has been written :-) Thank you for your help and guidance – GJW Feb 25 '23 at 05:50
  • The first SO rule is to make an [MRE](https://stackoverflow.com/help/minimal-reproducible-example). Yours is not. This is because you are using operators found in external libraries that you omit to include, such as for `%>%`. You first use variables, such as `dfa` in `subset(dfa, ...)` and then you define it. The data you provide add unnecessary complexity for people trying to help. You do not need to give us the real data, but a simple sample, which is enough to reproduce your issue. An MRE and the related solution are likely to be useful to a wide audience, not only the OP. – antonio Mar 10 '23 at 23:56

0 Answers0