3

I have two columns. one has a list of variables e.g. "cat", "dog", "rat", "chicken" and the other is whether the pet shop was visited on the first or second trip.

visit_number    pet
      1         dog
      2         dog
      1         cat
      2         cat
      1         rat
      2         chicken

I am looking to get compare the differences between the two visits in R e.g. intersect() and setdiff(). Basically exactly the same as this question:

Compare two lists in R

However, I don't have two lists but have two variables in a single column and I cant seem to get the code to work.

what I am trying to achieve is a function like this but that uses the single column instead rather than the two lists (code taken from the other question):

xtab_set <- function(A,B){
    both    <-  union(A,B)
    inA     <-  both %in% A
    inB     <-  both %in% B
    return(table(inA,inB))
}
DFinch
  • 139
  • 1
  • 10
  • 1
    Do you always have 2 visits: 1 and 2, or more? – zx8754 Sep 25 '18 at 17:05
  • 4
    Not clear what the output should be here...? There's `with(DF, xtab_set(pet[visit_number == 1], pet[visit_number == 2]))` – Frank Sep 25 '18 at 17:06
  • "I am looking to get compare the differences between the two visits in R" - the question is: difference in what? You could have a time-varying value, such as health from 1 to 5, for each pet. You could then calculate the change in pet score between the visits. However I'm not sure that's what you're after. – JonMinton Sep 25 '18 at 17:44
  • @zx8754 yes in this case I only ever have two visits but I would need to know the same information for multiple visits at some stage so that would be very useful to know? – DFinch Sep 26 '18 at 09:54
  • 1
    @Frank thank you so much. that piece of code works a treat and how I would like the output. – DFinch Sep 26 '18 at 09:55
  • @DFinch if Frank's solution works, then it is a duplicate? – zx8754 Sep 26 '18 at 09:56
  • @JonMinton I am only looking for the number (count) of unique individual animals per visit that occurred only in visit one, only in visit two and occurred i both visits. – DFinch Sep 26 '18 at 09:57
  • @zx8754 so add the other visits to Frank's code e.g. with(DF, xtab_set(pet[visit_number == 1], pet[visit_number == 2], pet[visit_number == 3], pet[visit_number == n])) but how would that work in the function then? – DFinch Sep 26 '18 at 10:01
  • 1
    Please provide example data with 3 sets, and expected output. – zx8754 Sep 26 '18 at 10:08
  • @zx8754 having had a play around with the data I realized I couldn't get a matrix with "pets" occurring in both visits with more than two visits but it would still be good to know unique records in that place. I have added a document [https://drive.google.com/file/d/1tMYDYnibyoNP3m_PZyOO3X30UPI_Ixh0/view?usp=sharing] that has data inputs, matrix layout and expected outputs to try and explain what I was thinking of. – DFinch Sep 26 '18 at 11:16

2 Answers2

1

Frankly speaking, the output matrix is not very clear. However, you mentioned at the comment that you are “looking for the number (count) of unique individual animals per visit that occurred only in visit one, only in visit two and occurred i both visits.” Also in the document you provided there are three visits. I am considering three visits.

The following code will show the number of unique individual animals by visits as well as number of unique individual animals that appeared in all visits.

Step 1. Build a raw dataset

library(data.table)
df = data.table(visit_number = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3), 
                pet = c("Dog", "Rat", "Cat", "Dog", "Chicken", "Cat", "Dog", "Cat", "Fish", "Horse"))

Step 2. Create a vector of understandable column names for future reference

cols = c(paste0(rep("Visit", length(unique(df$visit_number))), unique(df$visit_number)))

Step 3. Create a matrix of pets appearance

df = dcast.data.table(df, pet ~ visit_number, value.var = "pet", fun.aggregate = length)
names(df)[-1] = cols # assign understandable column names

Step 4. Define pets that appeared in all visits

df[, AllVisits := Reduce(`*`, .SD), .SDcols = cols]

It gives:

df
       pet Visit1 Visit2 Visit3 AllVisits
1:     Cat      1      1      1         1
2: Chicken      0      1      0         0
3:     Dog      1      1      1         1
4:    Fish      0      0      1         0
5:   Horse      0      0      1         0
6:     Rat      1      0      0         0

Rat was unique for Visit 1, Chicken was unique for Visit 2, Fish and Horse were unique for Visit 3. Cat and Dog appeared in all visits.

Step 5. Get the number of unique number of animals by visits and unique number of animals that appeared in all visits

idx = df[, Reduce(`+`, .SD) == 1, .SDcols = cols]
unlist(c(df[idx, lapply(.SD, function(x) sum(x)), .SDcols = cols], AllVisits = df[, sum(AllVisits)]))

The result is:

Visit1    Visit2    Visit3 AllVisits 
     1         1         2         2 

Let me know if that is what you are looking for.

P.S. The code will require modification if pets may appear several times during the visit.

Serhii
  • 362
  • 4
  • 15
0

If I understood what you are asking correctly, here is a solution using functions from the dplyr package:

full_join(filter(df, visit_number == 1), filter(df, visit_number == 2), by = 'pet') %>%
    mutate(visit1 = !is.na(visit_number.x),
           visit2 = !is.na(visit_number.y),
           both = visit1 & visit2) %>% 
    select(-starts_with('visit_number'))

Giving:

      pet visit1 visit2  both
1     dog   TRUE   TRUE  TRUE
2     cat   TRUE   TRUE  TRUE
3     rat   TRUE  FALSE FALSE
4 chicken  FALSE   TRUE FALSE
C. Braun
  • 5,061
  • 19
  • 47