Finding if a value is within the range of other columns

Question

I have a dataframe df which looks like this:

Input:

df <- read.table(text = 

"ID  Q1_PM Q1_TP Q1_overall  Q2_PM  Q2_LS  Q2_overall
 1   1     2     3           1       2     2       
 2   0     NA    NA          2       1     1 
 3   2     1     1           3       4     0  
 4   1     0     2           4       0     2 
 5   NA    1     NA          0       NA    0  
 6   2     0     1           1       NA    NA"   

, header = TRUE)

Desired Output:

To explain a little further, my desired output is as below:

 ID  Q1_PM Q1_TP Q1_overall  Q2_PM  Q2_LS  Q2_overall Q1_check  Q2_check
 1   1     2     3           1       2     2          "above"   "within"
 2   0     NA    NA          2       1     1           NA       "within"
 3   2     1     1           3       4     0          "within"  "below"
 4   1     0     2           4       0     2          "above"   "within"
 5   NA    1     NA          0       NA    0           NA       "within"
 6   2     0     1           1       NA    NA         "within"   NA

Explanation:

Example 1:

Based on the value in columns Q1_PM and Q1_TP, I want to see whether the value in column Q1_overall is within their range or not? If, not in range, is the value above or below the range? To track this, I want to add an additional column Q1_check.

Example 2:

Similarly, based on the values of Q2_PM and Q2_LS, I want to check if the value of Q2_overall is within their range or not? If not in range, is it above or below the range? Again, to track this, I want to add an additional column Q2_check

Requirements:

1- For this, I want to add additional columns Q1_check and Q2_check where the first column is for the comparisons that involve Q1 items and the second column is for the comparisons that involve Q2 items.

2- The columns could contain the following values: above, below and within.

3- The case when the columns named overall have NAs, then the extra columns could also have NAs.

Partial Solution:

The only solution, I can think of is, along these lines:

df$Q1_check <- ifelse(data$Q1_overall < data$Q1_PM, 'below',
                        ifelse(data$Q1_overall > data$Q1_TP, 'above', 
                               ifelse(is.na(data$Q1_overall), NA, 'within')))

But it results in following error: Error in data$Q1_overall : object of type 'closure' is not subsettable. I do not understand what the possible issue could be.

OR

df %>%
  mutate(Regulation = case_when(Q1_overall < Q1_PM ~ 'below',
                                Q1_overall > Q1_TP ~ 'above', 
                                Q1_PM < Q1_overall < Q1_TP, 'within'))

This also results in error Error: unexpected '<' in: "Q1_overall > Q1_TP ~ 'above', Q1_PM < Q1_overall <"

Edit 1:

How can the solution be extended if (let's say) the columns are these:

"Q1 Comm - 01 Scope Thesis"
"Q1 Comm - 02 Scope Project"
"Q1 Comm - 03 Learn Intern"
"Q1 Comm - 04 Biography"
"Q1 Comm - 05 Exhibit"
"Q1 Comm - 06 Social Act"
"Q1 Comm - 07 Post Project"
"Q1 Comm - 08 Learn Plant"
"Q1 Comm - 09 Study Narrate"
"Q1 Comm - 10 Learn Participate"
"Q1 Comm - 11 Write 1"
"Q1 Comm - 12 Read 2"
"Q1 Comm - Overall Study Plan"

How can we identify when the column Q1 Comm - Overall Study Plan is:

1 - Below the min() of all the other columns, or

2 - Above the max() of all the other columns, or

3 - Within the range of all the other columns

Edit 2:

For the updated fields, I am also including the dput(df)

dput(df)

structure(list(ï..ID = c(10L, 31L, 225L, 243L), Q1.Comm...01.Scope.Thesis = c(NA, 
2L, 0L, NA), Q1.Comm...02.Scope.Project = c(NA, NA, NA, 2L), 
    Q1.Comm...03.Learn.Intern = c(4L, NA, NA, NA), Q1.Comm...04.Biography = c(NA, 
    NA, NA, 1L), Q1.Comm...05.Exhibit = c(4L, 2L, NA, NA), Q1.Comm...06.Social.Act = c(NA, 
    NA, NA, 3L), Q1.Comm...07.Post.Project = c(NA, NA, 3L, NA
    ), Q1.Comm...08.Learn.Plant = c(NA, NA, NA, 4L), Q1.Comm...09.Study.Narrate = c(NA, 
    NA, 0L, NA), Q1.Comm...10.Learn.Participate = c(4L, NA, NA, 
    NA), Q1.Comm...11.Write.1 = c(NA, 2L, NA, NA), Q1.Comm...12.Read.2 = c(NA, 
    NA, 1L, NA), Q1.Comm...Overall.Study.Plan = c(4L, 1L, 2L, 
    NA), X = c(NA, NA, NA, NA), X.1 = c(NA, NA, NA, NA), X.2 = c(NA, 
    NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L
))

Any advice on how to achieve this would be greatly appreciated. Thank you!

The error you get is because `data$Q1-overall` is parsed as `data$Q1 - overall`; you want `data$\`Q1-overall\`` — Bas, Jun 09 '21 at 06:53
Why are the column names inconsistent? I.e., why `Q1-TP` and not `Q1-LS`, for example? — Bas, Jun 09 '21 at 06:54
There are multiple different measures in my case, to mock that up, I made ```Q1-TS``` and ```Q1-LS```. If having both as ```TP``` or both as ```LS``` helps simplify the solution, please do suggest. I could do other manipulation. — Sandy, Jun 09 '21 at 06:57
@Sandy, from which columns (edited scenario) you want to compare `Q1 Comm - Overall Study Plan` say? I'm not sure what are you after? — AnilGoyal, Jun 09 '21 at 11:22
Thank you @AnilGoyal for your quick reply. I have updated my question to address your comment. Is it any clear now? — Sandy, Jun 09 '21 at 11:24

score 1 · Answer 1 · answered Jun 09 '21 at 06:53

1

Seems a very long winded approach -

library(dplyr)

comparison <- function(x, y, z) {
  case_when(is.na(z) ~ NA_character_,
            z >= x & z <= y | 
              z >= y & z <= x |
              is.na(x) & y == z |
              is.na(y) & x == z ~ 'within', 
            z > x & z > y ~ 'above', 
            TRUE ~ 'below')
}

df %>%
  mutate(Q1_check = comparison(Q1.PM, Q1.TP, Q1.overall), 
          Q2_check = comparison(Q2.PM, Q2.LS, Q2.overall))
  
         
#  ID Q1.PM Q1.TP Q1.overall Q2.PM Q2.LS Q2.overall Q1_check Q2_check
#1  1     1     2          3     1     2          2    above   within
#2  2     0    NA         NA     2     1          1     <NA>   within
#3  3     2     1          1     3     4          0   within    below
#4  4     1     0          2     4     0          2    above   within
#5  5    NA     1         NA     0    NA          0     <NA>   within
#6  6     2     0          1     1    NA         NA   within     <NA>

answered Jun 09 '21 at 06:53

Ronak Shah

377,200
20
156
213

Thank you @Ronak Shah, how can I apply it to several columns (let's say 8 different columns), it would make this solution quite complicated... – Sandy Jun 09 '21 at 06:59
1

What do you mean? So instead of `PM`, `TP` and `Overall` there are other columns in your data? What are the rules to check in them. – Ronak Shah Jun 09 '21 at 07:02
I just want to perform column comparisons to see if the ```overall``` column is within the range of the other columns. For example, having some maximum and minimum and then seeing if the overall column falls within that maximum and minimum. Does it answer your question @Ronak Shah? – Sandy Jun 09 '21 at 07:56
I have edited my question @Ronak Shah, and I have provided my actual column names, can you advise based on that, please? – Sandy Jun 09 '21 at 10:58

Yuriy Saraykin · Answer 2 · 2021-06-09T07:22:26.883

df <- read.table(text = 
                   
                   "ID  Q1-PM Q1-TP Q1-overall  Q2-PM  Q2-LS  Q2-overall
 1   1     2     3           1       2     2       
 2   0     NA    NA          2       1     1 
 3   2     1     1           3       4     0  
 4   1     0     2           4       0     2 
 5   NA    1     NA          0       NA    0  
 6   2     0     1           1       NA    NA"   
                 
                 , header = TRUE)

library(tidyverse)


f <- function(x, y, z){
  case_when(
    z < pmin(x, y, na.rm = TRUE) ~ "below",
    z > pmax(x, y, na.rm = TRUE) ~ "abowe",
    between(z, pmin(x, y, na.rm = TRUE), pmax(x, y, na.rm = TRUE)) ~ "within"
  )
}

df %>%
  rowwise() %>% 
  mutate(Q1_check = f(Q1.PM, Q1.TP, Q1.overall),
         Q2_check = f(Q2.PM, Q2.LS, Q2.overall))
#> # A tibble: 6 x 9
#> # Rowwise: 
#>      ID Q1.PM Q1.TP Q1.overall Q2.PM Q2.LS Q2.overall Q1_check Q2_check
#>   <int> <int> <int>      <int> <int> <int>      <int> <chr>    <chr>   
#> 1     1     1     2          3     1     2          2 abowe    within  
#> 2     2     0    NA         NA     2     1          1 <NA>     within  
#> 3     3     2     1          1     3     4          0 within   below   
#> 4     4     1     0          2     4     0          2 abowe    within  
#> 5     5    NA     1         NA     0    NA          0 <NA>     within  
#> 6     6     2     0          1     1    NA         NA within   <NA>

^{Created on 2021-06-09 by the reprex package (v2.0.0)}

Thank you @Yuriy Saraykin, any chance we could update the function to calculate ```pmin()``` or ```pmax()``` of more than two columns? — Sandy, Jun 09 '21 at 10:26

koolmees · Answer 3 · 2021-06-09T13:28:09.700

Largely based on Ronak's great solution:

df <- structure(list(ID = c(10L, 31L, 225L, 243L), 
                      `Q1 Comm - 01 Scope Thesis` = c(NA, 2L, 0L, NA), 
                      `Q1 Comm - 02 Scope Project` = c(NA, NA, NA, 2L), 
                      `Q1 Comm - 03 Learn Intern` = c(4L, NA, NA, NA), 
                      `Q1 Comm - 04 Biography` = c(NA, NA, NA, 1L), 
                      `Q1 Comm - 05 Exhibit` = c(4L, 2L, NA, NA), 
                      `Q1 Comm - 06 Social Act` = c(NA, NA, NA, 3L), 
                      `Q1 Comm - 07 Post Project` = c(NA, NA, 3L, NA), 
                      `Q1 Comm - 08 Learn Plant` = c(NA, NA, NA, 4L), 
                      `Q1 Comm - 09 Study Narrate` = c(NA, NA, 0L, NA), 
                      `Q1 Comm - 10 Learn Participate` = c(4L, NA, NA,NA), 
                      `Q1 Comm - 11 Write 1` = c(NA, 2L, NA, NA), 
                      `Q1 Comm - 12 Read 2` = c(NA, NA, 1L, NA), 
                      `Q1 Comm - Overall Study Plan` = c(4L, 1L, 2L, NA), 
                      X = c(NA, NA, NA, NA), 
                      `X 1` = c(NA, NA, NA, NA), 
                      `X 2` = c(NA, NA, NA, NA)), 
                class = "data.frame", row.names = c(NA, -4L))

library(dplyr)

comparison <- function(df, prefix) {
  df <- df[grep(prefix, colnames(df))]
  min <- apply(df[-grep("Overall", colnames(df))], 1, min, na.rm = T)
  max <- apply(df[-grep("Overall", colnames(df))], 1, max, na.rm = T)
  z <- df[grep("Overall", colnames(df))]
  case_when(is.na(z) ~ NA_character_,
            z >= min & z <= max ~ 'within', 
            z > max ~ 'above', 
            TRUE ~ 'below')
}

prefixes <- sub(" \\- Overall.*", "", colnames(df[grep("Overall", colnames(df))]))

for (i in prefixes) {
  df <- df %>%
    mutate("{i} - Check" := comparison(df, i))
}


> print(df)
   ID Q1 Comm - 01 Scope Thesis Q1 Comm - 02 Scope Project Q1 Comm - 03 Learn Intern Q1 Comm - 04 Biography
1  10                        NA                         NA                         4                     NA
2  31                         2                         NA                        NA                     NA
3 225                         0                         NA                        NA                     NA
4 243                        NA                          2                        NA                      1
  Q1 Comm - 05 Exhibit Q1 Comm - 06 Social Act Q1 Comm - 07 Post Project Q1 Comm - 08 Learn Plant
1                    4                      NA                        NA                       NA
2                    2                      NA                        NA                       NA
3                   NA                      NA                         3                       NA
4                   NA                       3                        NA                        4
  Q1 Comm - 09 Study Narrate Q1 Comm - 10 Learn Participate Q1 Comm - 11 Write 1 Q1 Comm - 12 Read 2
1                         NA                              4                   NA                  NA
2                         NA                             NA                    2                  NA
3                          0                             NA                   NA                   1
4                         NA                             NA                   NA                  NA
  Q1 Comm - Overall Study Plan  X X 1 X 2 Q1 Comm - Check
1                            4 NA  NA  NA          within
2                            1 NA  NA  NA           below
3                            2 NA  NA  NA          within
4                           NA NA  NA  NA            <NA>

For comparison or finding the range of more than 2 columns, it would be hard to apply this solution? — Sandy, Jun 09 '21 at 10:16
I tried to give a solution based on what you wrote under Ronak's solution. What it does is it grabs the prefix of all columns that have ".overall" in their name (so Q1 and Q2 in your example) and creates a check column, no matter if you apply it to 2 columns or 200. The only potential pitfalls are that there needs to be 3 columns per prefix and they need to be in the same order as you put in your example (min, max, overall) — koolmees, Jun 09 '21 at 10:25
I have edited my question and have provided my actual column names, can you advise based on that, please? — Sandy, Jun 09 '21 at 10:57
Edited my solution to meet your request. It now checks the min and max of every row within the columns that match the prefix (excluding Overall) and compares that with the Overall of said prefix — koolmees, Jun 09 '21 at 12:17
Thank you for the update, I have included a reference ```dput(df)``` a sample where I am trying to run the code. — Sandy, Jun 09 '21 at 12:27
I have tried this script, it is returning ```above``` for all rows. Some where the logic is not working... — Sandy, Jun 09 '21 at 12:35
I don't experience the same issue when using your dput and the column names you had provided in your first edit. Maybe you have a package loaded that overwrites one of the base functions that are used? — koolmees, Jun 09 '21 at 13:25

AnilGoyal · Answer 4 · 2021-06-09T10:38:05.987

If your columns are named similarly, you may do this for any number of Qs simultaneously.

changed - in column names to acceptable _
changed Q2_LS to Q2_TP for sake of similarity

What is does -

It picks up every column that ends with _overall (2 here but can be any number)
check this columns values as -
- If less than column having name _PM / _TP in lieu of _overall allocates value below
- If greater than column having name _PM/_TP in lieu of _overall allocates value above
  - To access these column values I used get alongwith cur_column and stringr string replacement function
- if current value is NA allocated a NA_character
- otherwise allocates value within
Now, for final mutated columns (all at once) it renames these by removing _overall from these columns and pasting _check instead (I used .names argument of across here)
- For this I used stringr::str_remove inside glue argument (.names follow glue style of formula)

df <- read.table(text = 
                   
                   "ID  Q1_PM Q1_TP Q1_overall  Q2_PM  Q2_TP  Q2_overall
 1   1     2     3           1       2     2       
 2   0     NA    NA          2       1     1 
 3   2     1     1           3       4     0  
 4   1     0     2           4       0     2 
 5   NA    1     NA          0       NA    0  
 6   2     0     1           1       NA    NA"   
 
 , header = TRUE)

df
#>   ID Q1_PM Q1_TP Q1_overall Q2_PM Q2_TP Q2_overall
#> 1  1     1     2          3     1     2          2
#> 2  2     0    NA         NA     2     1          1
#> 3  3     2     1          1     3     4          0
#> 4  4     1     0          2     4     0          2
#> 5  5    NA     1         NA     0    NA          0
#> 6  6     2     0          1     1    NA         NA
library(tidyverse)
df %>% mutate(across(ends_with('overall'), ~ case_when(. < pmin(get(str_replace(cur_column(), '_overall', '_PM')),
                                                                 get(str_replace(cur_column(), '_overall', '_TP'))) ~ 'below',
                                                       . > pmax(get(str_replace(cur_column(), '_overall', '_PM')),
                                                                 get(str_replace(cur_column(), '_overall', '_TP'))) ~ 'above',
                                                       is.na(.) ~ NA_character_,
                                                       TRUE ~ 'within'),
                     .names = '{str_remove(.col,"_overall")}_check'))
#>   ID Q1_PM Q1_TP Q1_overall Q2_PM Q2_TP Q2_overall Q1_check Q2_check
#> 1  1     1     2          3     1     2          2    above   within
#> 2  2     0    NA         NA     2     1          1     <NA>   within
#> 3  3     2     1          1     3     4          0   within    below
#> 4  4     1     0          2     4     0          2    above   within
#> 5  5    NA     1         NA     0    NA          0     <NA>   within
#> 6  6     2     0          1     1    NA         NA   within     <NA>