2

I have the following data frame

Year ID V1 
2000 1  4 
2000 2  1 
2000 3  2  
2001 1  3  
2001 2  1  
2001 3  5  
.....

I have a function that takes the above data frame and a year value, performs a regression (V1 against ID), and returns a data frame containing the fitted coefficients for each ID for that year:

ID Coeff
 1   4  
 2   1  
 3   2  
 .....

I would like to run the above function for a set of year values, extract the ID and its corresponding fitted coefficients for that year, and bind them into a data frame:

Year ID Coeff 
2000 1  4 
2000 2  1 
2000 3  2  
2001 1  3  
2001 2  1  
2001 3  5  
.....

I can do the above with a for loop but I'm wondering if there's a better alternative (using dplyr or something else).

Edit:

data(iris)
set.seed(2)
iris$Sepal.Length <- as.factor(iris$Sepal.Length)
iris$Sepal.Width <- as.factor(iris$Sepal.Width)
iris$Random <- sample(0:1, size = nrow(iris), replace = TRUE)

fit_function <- function(df, Species) {
    fit <- glm(Random ~ -1+Sepal.Length + Sepal.Width, 
           data = df[df$Species == Species,], 
           family = "binomial")
    final_df <- data.frame(Species = Species, Name = names(coef(fit)), Coef = unname(coef(fit)))
    return(final_df)
}

all <- c()

for (i in unique(as.character(iris$Species))) {
    all <- rbind(all, fit_function(iris, i))    
}
Yandle
  • 287
  • 1
  • 10
  • I don't have a list of data frame, I'm calling a function which returns a data frame repeatedly with a for loop, and right now I'm calling rbind at each iteration of the for loop to bind my current data frame with the new data frame from the function call at that iterationn, which is very inefficient. – Yandle Mar 14 '19 at 22:18
  • I've only used lapply to apply a function over columns of a data frame, how do I use lapply to apply over multiple subsets of a dataframe (grouped, based on my example, by Species of iris)? – Yandle Mar 14 '19 at 23:57
  • 1
    No, you're right. I totally misunderstood your problem. Take a look at this question: [Use dplyr's group_by to perform split-apply-combine](https://stackoverflow.com/questions/26664644/use-dplyrs-group-by-to-perform-split-apply-combine). I know I keep throwing duplicates at you, but `iris %>% group_by(Species) %>% do(fit_function(.))` replicates your for loop results (just remove mentions of `Species` from `fit_function`, since the `group_by` takes care of that. – divibisan Mar 15 '19 at 00:26

2 Answers2

0

You could try though MySQL within R. Let's say your first data frame is df1 and your second data frame is df2. Then you could try:

# Install the necessary package
library(sqldf)

sqldf('SELECT Year, df1.ID, Coeff
       FROM df1 JOIN df2
       ON df1.ID = df2.ID')

Since ID is common between the two data frames, you need to always sepcify which particular ID you are using.

Sandy
  • 1,100
  • 10
  • 18
0

I don't really understand the logistics of your question and without workable data or your code so far it's really impossible to know exactly what you're asking. In the future you should realize it's polite to include a sample of your data using dput() and show the code you have thus far. This is how I would go about solving your problem given the information you have posted:

library(tidyverse)

dat <- tribble(~"Year", ~"ID", ~"V1", 
        2000, 1,  4, 
        2000, 2,  1, 
        2000, 3,  2,  
        2001, 1,  3,  
        2001, 2,  1,  
        2001, 3,  5)

dat %>% 
  group_split(Year) %>% 
  map_df(~lm(V1 ~ as.factor(ID), data = .x) %>% 
        broom::tidy() %>% 
        select(term, estimate) %>% 
        mutate(YEAR = unique(.x$Year)))
#> # A tibble: 6 x 3
#>   term           estimate  YEAR
#>   <chr>             <dbl> <dbl>
#> 1 (Intercept)        4.    2000
#> 2 as.factor(ID)2    -3.    2000
#> 3 as.factor(ID)3    -2.    2000
#> 4 (Intercept)        3.    2001
#> 5 as.factor(ID)2    -2.    2001
#> 6 as.factor(ID)3     2.00  2001

Created on 2019-03-13 by the reprex package (v0.2.1)

dylanjm
  • 2,011
  • 9
  • 21
  • My apologies, I have attached a sample code in my question above. My main goal is to get rid of the for loop at the end. – Yandle Mar 14 '19 at 22:20