0

Sample data:

Group <- c("a", "a", "a", "b", "b", "b", "c", "c", "c")    
value_1 <- c(1.10, 2.5, 1.7, 0.99, 1.50, 1.65, 2.5, 2.5, 1.5)
value_2 <- c(0.03, 1.3, 3.5, 0.02, 4.3, 1.2, 1.4, 1.4, 3.7)
new_variable_1 <- c(1,0,1, 1,1,0, 0,0,1)    
df <- data.frame(Group, value_1, value_2, new_variable_1)

The output is new_variable_1. I want to create a new_variable_1 based on following criteria; I am seeking 2 solutions.

Basic idea:

lookup the max value in value_2 by group and create dummy variable based on values in value_1.

solution 1 Logic:

  1. Find max(value_2) by group. E.g., the max value in value_2 for group a is 3.5

  2. Find the corresponding value_1 by group. E.g., value_1 is 1.7 in group a

  3. create new_variable_1 by group that is 1 if value_1 is less than the corresponding value in the above step. E.g., for group a, value_1 <= 1.7 should show 1 & value_1 > 1.7 should show 0.

solution 2 Logic:

Same as above, but increase the threshold value from step 2 by 10%.

  1. the max value in value_2 for group a is 3.5

  2. it then corresponds to value 1.7 value_1 in group a

  3. Increase the value by 10%. For group a 10% in increase would be 1.87.

  4. Create new_variable_1: for group a, value_1 <= 1.87 should show 1 & value_1 > 1.87 should show 0.

R, dplyr, data.table and most efficient R codes are welcome.

It's a large dataset so groups may have different length and Inf or NA may exist in value_2.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Do you mean `data.table` or a `datatable`? – NelsonGon Mar 02 '19 at 12:08
  • Please read [this](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to better formulate the question. It is easier to copy and paste the desired output. It is more visually "appealing" or easier to follow(in my opinion). – NelsonGon Mar 02 '19 at 12:15

1 Answers1

1

We could try. I've used names starting with "New" to make it easier to follow.

Solution 1(Thanks to @Gregor):

library(dplyr)


 df %>% 
  group_by(Group) %>%  
  mutate(New_variable_1 = ifelse(value_1 <= value_1[which.max(value_2)], 1, 0))

Solution 2: Thanks to @Gregor

df %>% 
  group_by(Group) %>%  
  mutate(New_variable_1 = ifelse(value_1 <= value_1[which.max(value_2)], 1, 0),
         NewVar1=value_1[which.max(value_2)]*1.1)
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • Thanks Nelson. I need to tackle all groups and not just group a. your code is only correct for group "a" but not for other groups. – Jon Dongen Mar 02 '19 at 12:36
  • Well you don't say what happens to other groups so I assumed they take maximum values of `value_1`. @JonDongen – NelsonGon Mar 02 '19 at 12:36
  • new_variable_1 is the correct output. Apologies for not being clear. – Jon Dongen Mar 02 '19 at 12:38
  • Please check the output and see if it matches your expected values. I see one difference at row 6. @JonDongen – NelsonGon Mar 02 '19 at 12:41
  • Thanks Jon for your comments. Yes, code should tackle operation by group. In each group we select the max(value_2) and its corresponding value in (value_1). We then use the (value_1) by group to create the indicator variable. – Jon Dongen Mar 02 '19 at 12:43
  • Lets look at group "b". – Jon Dongen Mar 02 '19 at 12:45
  • The max(value_2) in group "b" is: 4.3 The corresponding value for group "b" (value_1) is: 1.5 so for group "b" solution 1: anything <= 1.5 is 1 and > 1.5 is 0 – Jon Dongen Mar 02 '19 at 12:47
  • Please simply tweak these values as required. You can use `%in%` to define groups and ranges. – NelsonGon Mar 02 '19 at 12:48
  • 1
    @NelsonGon I understand now, `1.7` is used because it is the `value_1` value in the same row as `max(value_2)` for group 'a'`. (That's what OP means by "corresponding".) Similarly 1.5 for group 'b' because it is in row 5, which has the max `value_2` for group 'b', etc. – Gregor Thomas Mar 02 '19 at 12:52
  • Nelson, please can you tweak the code by group. if i knew how to then I wouldn't have posted the question. – Jon Dongen Mar 02 '19 at 12:54
  • Thanks @Gregor for the insight. Let's see how to tweak it. – NelsonGon Mar 02 '19 at 12:55
  • 1
    I edited the question---I think it's clearer. You don't need an `ifelse` for the `max`, it's always the max of `value_2`. I think something like `New_variable_1 = ifelse(value_1 <= value_1[which.max(value_2)], 1, 0)` is what is needed (for solution 1... for solution 2 multiply by 1.1). – Gregor Thomas Mar 02 '19 at 13:00
  • Edited. Thanks for the clarification. – NelsonGon Mar 02 '19 at 13:09
  • I think it's the threshold that needs multiplication, not the result. `1.1 * value_1[which.max(value_2)` – Gregor Thomas Mar 02 '19 at 13:24
  • Ah, OP means this value by "the value". Got it, thanks again! @Gregor – NelsonGon Mar 02 '19 at 13:39