1

Apologies in advance, I've made a bit of a hash of this one. I have a relatively big data set which looks like this:

Here in lies the problem. I've been creating GLMs from which I take the estimates of the confounding variables and jigs the abline (if you don't know what I mean here, basically I need to calculate my line of best fit, not just shove it through the average points). This is all fine and dandy as I made a line of code which works this out for me. Sadly though, I have 19 of these graphs to produce - 1 for each row - and need to do this for six data sets.

My attempts to automate this process have been painful and depressing thus far. If anyone thinks being a biologist means cuddling pandas they are sadly wrong. I've got the code to take in variables and produce a graph one at a time, but haven't had any luck for producing them all on one frame.

Imagine roughly this, but with 19 graphs on it. that's the dream right now ![imagine roughly this, but with 19 graphs on it. that's the dream right now][2]

2 Answers2

1

Unfortunately, your data is not reproducible but I think the following can be adapted.

Working with several objects like that can get very messy. This is where using list can be very helpful. You only need your x, y and intercept in the my_list object. You can then plot all your charts using layout and a loop.

my_list <- list()                                                                                                           
for(i in 1:19){                                                                                                             
    x <- runif(10)                                                                                                          
    y <- rnorm(10)                                                                                                          
    intercept <- lm(y~x)$coefficients[1]                                                                                    
    name <- paste('plot_',i,sep='')                                                                                         
    tmp <- list(x=x, y=y, intercept=intercept)                                                                              
    my_list[[name]] <- tmp                                                                                                  
}                                                                                                                           

layout(matrix(1:20, nrow = 4, ncol = 5, byrow = TRUE))                                                                      
for(j in 1:length(my_list)) {                                                                                               
    plot(x=my_list[[j]]$x, y=my_list[[j]]$y, main=attributes(my_list[j])$names,xlab="x-label",ylab="y-label")               
    abline(h=my_list[[j]]$intercept)                                                                                        
}    

enter image description here

Community
  • 1
  • 1
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
1

Just wanted to post the ggplot2 version of what you're trying to do to see if that might work for you as well.

I also show an example of fitting lines for multiple classes within each facet (depending on how complicated the analysis is you're conducting).

First install ggplot2 if you don't have it already:

# install.packages('ggplot2')
library(ggplot2)

Here I am just setting up some dummy data using the built-in iris dataset. I'm essentially trying to simulate having 19 distinct datasets.

set.seed(1776)
samples <- list()

num_datasets <- 19
datasets <- list(num_datasets)


# dynamically create some samples
for(i in 1:num_datasets) {
    samples[[i]] <- sample(1:nrow(iris), 20)
}



# dynamically assign to many data sets (keep only 2 numeric columns)
for(i in 1:num_datasets) {
    datasets[[i]] <- cbind(iris[samples[[i]], c('Petal.Length', 'Petal.Width', 'Species')], dataset_id = i)
    # assign(paste0("dataset_", i), iris[samples[[i]], c('Petal.Length', 'Petal.Width')])
}

do.call is a bit tricky, but it takes in two arguments, a function, and a list of arguments to apply to that function. So I'm using rbind() on all of the distinct datasets within my datasets object (which is a list of datasets).

combined_data <- do.call(rbind, datasets)

First plot is one big scatter plot to show the data.

# all data
ggplot(data=combined_data, aes(x=Petal.Length, y=Petal.Width)) +
    geom_point(alpha = 0.2) +
    ggtitle("All data")

Next is 19 individual "facets" of plots all on the same scale and in the same graphing window.

# all data faceted by dataset_id
ggplot(data=combined_data, aes(x=Petal.Length, y=Petal.Width)) +
    geom_point(alpha = 0.5) +
    ggtitle("All data faceted by dataset") +
    facet_wrap(~ dataset_id) +
    geom_smooth(method='lm', se = F)

plot of facets with best fit lines plot of facets with best fit lines

Finally, the data plotted in facets again, but colored by the species of the iris flower and each species has its own line of best fit.

# all data faceted by dataset_id
ggplot(data=combined_data, aes(x=Petal.Length, y=Petal.Width, color = Species)) +
    geom_point(alpha = 0.5) +
    ggtitle("All data faceted by dataset with best fit lines per species") +
    facet_wrap(~ dataset_id) +
    geom_smooth(method='lm', se = F)

plots of facets with best fit within categories plots of facets with best fit within categories

I see you mentioned you had your own precalculated best fit line, but I think this conceptually might get you closer to where you need to be?

Cheers!

Richard Telford
  • 9,558
  • 6
  • 38
  • 51
TaylorV
  • 846
  • 9
  • 13