2

I have a table called myTable (input) :

 user_name   session_num  
1     "Joe"            1    
2     "Tom"            2    
3    "Fred"            1    
4     "Tom"            1    
5     "Joe"            2    
6     "John"           1 

I want to know how many of my user_id have only session_num = 1 (output) :

   user_name   session_num   
1     "Fred"             1
2     "John"             1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Smasell
  • 1,206
  • 2
  • 13
  • 21

4 Answers4

8

Here's a possible solution using data.table

library(data.table)
setDT(df)[, if(all(session_num == 1)) .SD, by = user_name]
#    user_name session_num
# 1:      Fred           1
# 2:      John           1

Another option is to try an anti join

df[session_num == 1][!df[session_num != 1], on = "user_name"]
#    user_name session_num
# 1:      Fred           1
# 2:      John           1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • You may also want to read [this](https://github.com/Rdatatable/data.table/wiki/Getting-started) in order to get more comfortable with `data.table` – David Arenburg Mar 17 '16 at 13:25
5

A comparable solution with dplyr:

library(dplyr)
myTable %>%
  group_by(user_name) %>%
  filter(all(session_num == 1))

which gives:

  user_name session_num
     (fctr)       (int)
1      Fred           1
2      John           1
Jaap
  • 81,064
  • 34
  • 182
  • 193
3

Alternatively we could simply exclude all users that have a session number other than 1, using base R.

# User's with session number other than 1
two <- myTable$user_name[myTable$session_num != 1] 

# Exclude them
myTable[!myTable$user_name %in% two,]
#  user_name session_num
#3      Fred           1
#6      John           1
mtoto
  • 23,919
  • 4
  • 58
  • 71
  • 1
    Interestingly we thought about this pretty much at the same time. Though I think using `!=1` instead of `==2` is safer in order to cover all possibilities. – David Arenburg Mar 17 '16 at 12:57
0

This is a 2 line answer:

library(data.table)
data1<-fread("test.csv")
data1[user_name == names(which(table(data1$user_name)==1)),][session_num==1,]

First it goes and looks at who is in the dataset only once and then secondly subsets on those where session_num==1.

Hanjo Odendaal
  • 1,395
  • 2
  • 13
  • 32
  • 1
    If you replace the `==` with `%in%` after `user_name` you get the expected result. With the latest official release of data.table (v1.9.6) the code throws an error. – RHertel Mar 17 '16 at 12:39
  • Not sure which of the `data.table` specific features this answer uses. – David Arenburg Mar 17 '16 at 12:42