0

I have a dataframe (hh02) of temperture recordings (logT) from multiple instances of recordings. Each instance is only a few hours but some instances span two dates from late in the night one day to early in the morning the next. I want to fit a linear model to each period of recording. Then, record in a new dataframe the parameters (slope and R2 value) of each line that was fit to each period of recording. I went ahead and created "plot" column to group by because I did no think I could group by date since some periods of recording include two different dates. Below is sample data but my real data has 450 days so the range of plots is really 1:450 not 1:3 as shown in the sample data.

#sample code

hh02 <- data.frame(row=c(1:16),
plot = c(1,1,1,1,1,2,2,2,3,3,3,3,3,3,3,3),
                 logT = c(1.092,1.091,1.0915,1.09,1.08,1.319,1.316,1.301,1.2134,1.213,1.21,1.22,1.23,1.20,1.19,1.19),
                 utc_datetime = c(2020-03-05T00:00:00Z,2020-03-05T00:30:00Z,2020-03-05T01:00:00Z,2020-03-05T01:30:00Z,2020-03-05T02:00:00Z, 2020-03-06T01:00:00Z,2020-03-06T01:30:00Z,2020-03-06T02:00:00Z,
2020-03-10T02:00:00Z,2020-03-10T02:30:00Z,2020-03-10T03:00:00Z,2020-03-10T03:30:00Z,2020-03-10T04:00:00Z,2020-03-10T04:30:00Z,2020-03-10T05:00:00Z,2020-03-10T05:30:00Z,))

I was planning to use the following code combined with the guidance for creating the lm from How to add linear lines to a plot with multiple data sets of a data frame? to create the plots and somehow pull the slope and R2 value for each line into a data frame.

for (var in unique(hh02$plot)) {
  
  ggplot(subset(hh02, plot == var), 
         aes(utc_datetime, logT)) +
    geom_point(aes(group = 1)) + 
    labs(x = "Time", y = "Log Indoor Room Temperature (degrees C)", 
         title = var) +
    scale_x_datetime(date_breaks = "1 hour", date_labels = "%H:%M") +
    theme(axis.text.x = element_text(angle = -90, vjust = 1, hjust = 1))
  
    ggsave(paste0(var,'.png'), width = 20, height = 20, units = "cm")
}

I realize the code I've written is to create separate plots for each group and the code from the referenced Stack Overflow answer creates one plot with all the data and lines from all the groups overlapping. However, not only is the ggplot code I supplied not working, but I've realized I don't think I necessarily need to even create the vizualizations to create the dataframe of information I need (?). If there is a way to fit a linear model to each group and export the parameters (line slope and R2) of each line to a dataframe without creating any plotted visualization I would be fine with that. As I said, in my real dataframe there are 450 groups/plots that would be created with this code and I don't necessarily need to create 450 visualizations, so if there is a way to get the information from each linear best fit line I'd prefer that. Based on the sample data provided my desired output would look something like:

plot slope r2 value
1 2.1 .96
2 1.3 .85
3 .8 .99
achtee
  • 11
  • 3

0 Answers0