Use R to perform function on subsets of data AND full data

Question

I am looking for a simple way to perform a function on a subset of the data as well as on the data as a whole. As of now, the code is set up to run the function for only subsets of the data (subsetting is done by unit, e.g. US Dollars, Euros, etc.). The goal is to run the function over the data subsetted by unit, but also run the function over the entire data. An example of a function I am trying to modify is:

Country_freq_unit<-function(ind_unit){
  rm(H1,H2,H3,H4)
  
  H1<-data1 %>%
    filter(unit==ind_unit)   %>% 
    drop_na(data) %>% 
    group_by(country_name, time_date) %>% 
    tally()
  
  H2<-data1 %>% 
    select(country_name, time_date) %>% 
    group_by(country_name, time_date) %>% 
    tally() %>% 
    select(country_name, time_date)
  
  H3<-full_join(H2, H1) %>% 
    replace_na(list(n = 0))
  
  H4<-H3 %>% 
    spread(key='time_date', value='n')
  
  return(H4)
  
}

I then execute the function in a loop that goes through all of the units that are available in my data.

unitlist<-unique(data$unit)
> unitlist
[1] "USD"  "OZT"  "XDR"  "RATE" "WK"  


for (s in unitlist) {  
  J1 = Country_freq_unit(s)
}

This part all works well, my question is just if there is an easy way to add an option to my loop to also execute the function over the entire dataframe, instead of subsets of the dataframe. I am wondering if there is something like all_of() or everything() for this context.

Please note that I did not actually write the code within the function (I have just been helping to add functions and loops to existing code).

Thank you for the help!

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, May 04 '21 at 18:07

score 0 · Answer 1 · answered May 04 '21 at 20:19

Here's one way:

Country_freq_unit <- function(ind_unit = NULL){
  # you don't need the rm statement, and it's dangerous, so I've
  # deleted it.
  
  H1 <- data1
  if (! is.null(ind_unit)) {
    H1 <- H1 %>% filter(unit==ind_unit) 
  }

  H1 <- H1 %>% 
    drop_na(data) %>% 
    group_by(country_name, time_date) %>% 
    tally()
  
  H2 <- data1 %>% 
          select(country_name, time_date) %>% 
          group_by(country_name, time_date) %>% 
          tally() %>% 
          select(country_name, time_date)
  
  H3 <- full_join(H2, H1) %>% 
          replace_na(list(n = 0))
  
  H4 <- H3 %>% 
    spread(key = 'time_date', value = 'n')
  
  return(H4)
}

Now to run the function for all your data, just call it without an ind_unit argument. There are some other weird things in your function, but I guess that's an issue for whoever wrote it....

Use R to perform function on subsets of data AND full data

1 Answers1