3

I am newish to R and having trouble with a for loop over unique values.

with the df:

id = c(1,2,2,3,3,4) 
rank = c(1,2,1,3,3,4) 
df = data.frame(id, rank)     

I run:

df$dg <- logical(6)

for(i in unique(df$id)){
  ifelse(!unique(df$rank), df$dg ==T, df$dg == F)
}

I am trying to mark the $dg variable as T providing that rank is different for each unique id and F if rank is the same within each id.

I am not getting any errors, but I am only getting F for all values of $dg even though I should be getting a mix.

I have also used the following loop with the same results:

for(i in unique(df$id)){
  ifelse(length(unique(df$rank)), df$dg ==T, df$dg == F)
}

I have read other similar posts but the advice has not worked for my case.

From Comments:

I want to mark dg TRUE for all instances of an id if rank changed at all for a given id. Im looking to say for a given ID which has anywhere between 1-13 instances, mark dg TRUE if rank differs across instances.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
MattM
  • 33
  • 4
  • I haven't run this yet but I think you need ```for(i in 1:length(unique(df$id))``` – Matt W. Jun 27 '17 at 20:45
  • The first argument to ifelse should be a vector of length nrow for the dataframe. Using "unique" is going to defeat that requirement. You also needd to do more htan call the ifelse function,,,, you need to assign its result to something. – IRTFM Jun 27 '17 at 22:11
  • Thank you for your help @Gregor. In my actual case, this ended up marking my $dg variable as TRUE whenever it had a unique response. What I am actually after is something that marks $dg TRUE on all rows with a given unique $id TRUE if any value of $rank is unique within that $id. Essentially, mark dg TRUE of an id if rank changed at all. – MattM Jun 27 '17 at 23:01
  • 1
    @Masoud : Thanks for your response. Similar to my comment above, it marked my $dg variable as TRUE whenever it had a unique response. I think perhaps I was unclear in my original aims. In my actual data, I want to mark dg TRUE for all instances of an id if rank changed at all for a given id. Im looking to say for a given ID which has anywhere between 1-13 instances, mark dg TRUE if rank differs across instances. – MattM Jun 28 '17 at 16:51
  • @MattM look at the updated answer. Just look at the `df2` that I provided. That will give an idea about the importance of providing a minimal example that reflects your needs clearly. Cheers. – M-- Jun 28 '17 at 18:13

1 Answers1

2

Update: How to identify groups (ids) that only have one rank?


After clarification that OP provided this would be a solution for this particular case:

library(dplyr)
df %>% 
   group_by(id) %>% 
         mutate(dg = ifelse( length(unique(rank))>1 | n() == 1, T, F))

For another data-set that has also an id, which has duplicates but also non-duplicate rank (presented below) this would be the output:

df2 %>% 
   group_by(id) %>% 
         mutate(dg = ifelse( length(unique(rank))>1 | n() == 1, T, F))

#:OUTPUT:

# Source: local data frame [9 x 3] 
# Groups: id [5] 
#  
# # A tibble: 9 x 3 
#      id  rank    dg 
#   <dbl> <dbl> <lgl> 
# 1     1     1  TRUE 
# 2     2     2  TRUE 
# 3     2     1  TRUE 
# 4     3     3 FALSE 
# 5     3     3 FALSE 
# 6     4     4  TRUE 
# 7     5     1  TRUE 
# 8     5     1  TRUE 
# 9     5     3  TRUE

Data-no-2:

df2 <- structure(list(id = c(1, 2, 2, 3, 3, 4, 5, 5, 5), rank = c(1, 2, 1, 3, 3, 4, 1, 1, 3
                )), .Names = c("id", "rank"), row.names = c(NA, -9L), class = "data.frame")



How to identify duplicated rows within each group (id)?


You can use dplyr package:

library(dplyr)
df %>% 
   group_by(id, rank) %>% 
                      mutate(dg = ifelse(n() > 1, F,T))

This will give you:

# Source: local data frame [6 x 3] 
# Groups: id, rank [5] 
#  
# # A tibble: 6 x 3 
#      id  rank    dg 
#   <dbl> <dbl> <lgl> 
# 1     1     1  TRUE 
# 2     2     2  TRUE 
# 3     2     1  TRUE 
# 4     3     3 FALSE 
# 5     3     3 FALSE 
# 6     4     4  TRUE

Note: You can simply convert it back to a data.frame().

A data.table solution would be:

dt <- data.table(df)
dt$dg <- ifelse(dt[ , dg := .N, by = list(id, rank)]$dg>1,F,T)

Data:

df <- structure(list(id = c(1, 2, 2, 3, 3, 4), rank = c(1, 2, 1, 3, 
      3, 4)), .Names = c("id", "rank"), row.names = c(NA, -6L), class = "data.frame")

# > df

#   id rank 
# 1  1    1 
# 2  2    2 
# 3  2    1 
# 4  3    3 
# 5  3    3 
# 6  4    4

N. B. Unless you want a different identifier rather than TRUE/FALSE, using ifelse() is redundant and costs computationally. @DavidArenburg

Community
  • 1
  • 1
M--
  • 25,431
  • 8
  • 61
  • 93
  • You do realize that `ifelse` has absolutely no use here, right? Just the `>` expression returns a `TRUE/FALSE` vector. – David Arenburg Jun 28 '17 at 21:35
  • @DavidArenburg I do realize that, but this question wants true false. what if someone wants 0 and 1. just a preference to make my answer more general. can you benchmark with and without ifelse and tell me what's the computational cost? If it's considerably higher then I'd definitely don't do that. – M-- Jun 28 '17 at 21:42
  • No, I'm not going to benchmark your answer for you, sorry. But you might want to read [this](https://stackoverflow.com/questions/16275149/does-ifelse-really-calculate-both-of-its-vectors-every-time-is-it-slow). – David Arenburg Jun 29 '17 at 06:10
  • @DavidArenburg I asked you to benchmark your suggestion, not my answer. Thanks for the link. I still think most of the times you don't want `TRUE/FALSE` in the column but another identifier like `0-1`. At least for what I need. Cheers. – M-- Jun 29 '17 at 13:17
  • 1
    You don't need `ifelse` for `0-1` neither. Just `as.integer(a > b)` will give you that. In fact, you won't need `ifelse` in 99% of the cases, and you can just operate on subsets. – David Arenburg Jun 29 '17 at 13:25