I have a dataframe df
which looks like this:
Input:
df <- read.table(text =
"ID Q1_PM Q1_TP Q1_overall Q2_PM Q2_LS Q2_overall
1 1 2 3 1 2 2
2 0 NA NA 2 1 1
3 2 1 1 3 4 0
4 1 0 2 4 0 2
5 NA 1 NA 0 NA 0
6 2 0 1 1 NA NA"
, header = TRUE)
Desired Output:
To explain a little further, my desired output is as below:
ID Q1_PM Q1_TP Q1_overall Q2_PM Q2_LS Q2_overall Q1_check Q2_check
1 1 2 3 1 2 2 "above" "within"
2 0 NA NA 2 1 1 NA "within"
3 2 1 1 3 4 0 "within" "below"
4 1 0 2 4 0 2 "above" "within"
5 NA 1 NA 0 NA 0 NA "within"
6 2 0 1 1 NA NA "within" NA
Explanation:
Example 1:
Based on the value in columns Q1_PM
and Q1_TP
, I want to see whether the value in column Q1_overall
is within their range or not? If, not in range, is the value above or below the range? To track this, I want to add an additional column Q1_check
.
Example 2:
Similarly, based on the values of Q2_PM
and Q2_LS
, I want to check if the value of Q2_overall
is within their range or not? If not in range, is it above or below the range? Again, to track this, I want to add an additional column Q2_check
Requirements:
1- For this, I want to add additional columns Q1_check
and Q2_check
where the first column is for the comparisons that involve Q1
items and the second column is for the comparisons that involve Q2
items.
2- The columns could contain the following values: above
, below
and within
.
3- The case when the columns named overall
have NAs
, then the extra columns could also have NAs
.
Related posts:
I have looked for related posts such as: Add column with values depending on another column to a dataframe and Create categories by comparing a numeric column with a fixed value but I am running into errors as discussed below.
Partial Solution:
The only solution, I can think of is, along these lines:
df$Q1_check <- ifelse(data$Q1_overall < data$Q1_PM, 'below',
ifelse(data$Q1_overall > data$Q1_TP, 'above',
ifelse(is.na(data$Q1_overall), NA, 'within')))
But it results in following error: Error in data$Q1_overall : object of type 'closure' is not subsettable
. I do not understand what the possible issue could be.
OR
df %>%
mutate(Regulation = case_when(Q1_overall < Q1_PM ~ 'below',
Q1_overall > Q1_TP ~ 'above',
Q1_PM < Q1_overall < Q1_TP, 'within'))
This also results in error Error: unexpected '<' in: "Q1_overall > Q1_TP ~ 'above', Q1_PM < Q1_overall <"
Edit 1:
How can the solution be extended if (let's say) the columns are these:
"Q1 Comm - 01 Scope Thesis"
"Q1 Comm - 02 Scope Project"
"Q1 Comm - 03 Learn Intern"
"Q1 Comm - 04 Biography"
"Q1 Comm - 05 Exhibit"
"Q1 Comm - 06 Social Act"
"Q1 Comm - 07 Post Project"
"Q1 Comm - 08 Learn Plant"
"Q1 Comm - 09 Study Narrate"
"Q1 Comm - 10 Learn Participate"
"Q1 Comm - 11 Write 1"
"Q1 Comm - 12 Read 2"
"Q1 Comm - Overall Study Plan"
How can we identify when the column Q1 Comm - Overall Study Plan
is:
1 - Below
the min()
of all the other columns, or
2 - Above
the max()
of all the other columns, or
3 - Within
the range of all the other columns
Edit 2:
For the updated fields, I am also including the dput(df)
dput(df)
structure(list(ï..ID = c(10L, 31L, 225L, 243L), Q1.Comm...01.Scope.Thesis = c(NA,
2L, 0L, NA), Q1.Comm...02.Scope.Project = c(NA, NA, NA, 2L),
Q1.Comm...03.Learn.Intern = c(4L, NA, NA, NA), Q1.Comm...04.Biography = c(NA,
NA, NA, 1L), Q1.Comm...05.Exhibit = c(4L, 2L, NA, NA), Q1.Comm...06.Social.Act = c(NA,
NA, NA, 3L), Q1.Comm...07.Post.Project = c(NA, NA, 3L, NA
), Q1.Comm...08.Learn.Plant = c(NA, NA, NA, 4L), Q1.Comm...09.Study.Narrate = c(NA,
NA, 0L, NA), Q1.Comm...10.Learn.Participate = c(4L, NA, NA,
NA), Q1.Comm...11.Write.1 = c(NA, 2L, NA, NA), Q1.Comm...12.Read.2 = c(NA,
NA, 1L, NA), Q1.Comm...Overall.Study.Plan = c(4L, 1L, 2L,
NA), X = c(NA, NA, NA, NA), X.1 = c(NA, NA, NA, NA), X.2 = c(NA,
NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L
))
Any advice on how to achieve this would be greatly appreciated. Thank you!