0

EDIT: I want to specify which values NOT to include in my calculation by providing a list of values for records to skip. I do NOT want to provide a list of values to include in my calculation because my dataset is too large.

I want to group records based on a certain value, and then I want to do some other calculations for certain variables; however, I want to exclude certain values from one of those calculations. Here is an example of what the data transformation would look like without any exclusions:

library(dplyr)


grouped <- starwars %>% 
  group_by(species) %>% #group my data by a particular value
  summarise(Total_Mass = sum(mass), #make a calculation 
            Average_Height = mean(height)) # make another calculation

and here's what I am attempting to do:

exclude <- c("R2-D2","Luke","Darth") #make a list of the names of records I would like to exclude

grouped2 <- starwars %>% 
  group_by(species) %>% 
  summarise(Total_Mass = sum(mass) where name !%in% exclude, #sum mass for all records except those where name is in the exclude list
            Average_Height = mean(height)) # make another calculation without any exclusions




Brenda Thompson
  • 327
  • 2
  • 9
  • Applying the method in the linked duplicate, you would use `Total_mass = sum(mass[!(name %in% exclude)])`. – zephryl Nov 11 '22 at 00:24
  • @zephryl thank you I was looking for that syntax as I could not find it anywhere. I eventually figured it out after trying many different iterations of the syntax. I understand my question was very similar to the linked duplicate however for someone who isn't familiar with dplyr syntax, another question is needed. – Brenda Thompson Nov 11 '22 at 00:29
  • It's not specifically dplyr syntax; the same principles should apply in other contexts where you're subsetting a vector by negation – camille Nov 13 '22 at 18:03

0 Answers0