I have a set of data which I've re-sampled, is there a command of a function that I can use in R to smooth the data first, and only then create the graph from the created data frame?.
My data has a lot of noise, an after I've re-sampled the data, now I want to smooth out the data, I used the geom_smooth
to produce a graphic of the data, but the command only creates the graphical representation of the smoothed out data, without giving out the values of the points it represented.
use ggplot
library(ggplot2)
library(dplyr)
library(plotly)
df <- read.csv("data.csv", header = T)
str(df)
rs <- sample_n(df,715)
q <-
ggplot(df,aes(x,y)) +
geom_line() +
geom_smooth(method = "loess", formula = y~log(x), span = 0.05)
This is what I used to smooth out my data, I used loess, formula = y~log(x), span = 0.05 because out of all the smoothing out method I've tried, this is the closest result to what I want which is smoothing with the least errors or differences from the original data.
this is a printout of the head(rs)
and glimpse(rs)
> head(rs)
Date DLTime Time24 RH Temp PM2.5 CO2 MCO2 MPM25 t
1 21/05/2019 8:33:21 15:21:36 73.5 25.9 34 1096.88 1096.88 34 2019-05-21 15:21:36
2 21/05/2019 8:56:33 15:44:48 75.4 25.6 32 975.00 975.00 32 2019-05-21 15:44:48
3 21/05/2019 8:22:43 15:10:58 75.9 26.1 59 1068.75 1068.75 59 2019-05-21 15:10:58
4 21/05/2019 8:51:53 15:40:08 74.7 25.6 45 975.00 975.00 45 2019-05-21 15:40:08
5 21/05/2019 8:47:30 15:35:45 75.0 25.7 40 1006.25 1006.25 40 2019-05-21 15:35:45
6 21/05/2019 8:35:59 15:24:14 73.7 25.8 32 1984.38 1068.75 32 2019-05-21 15:24:14
> glimpse(rs)
Observations: 715
Variables: 10
$ Date <fct> 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019, 21/05/2019,...
$ DLTime <fct> 8:33:21, 8:56:33, 8:22:43, 8:51:53, 8:47:30, 8:35:59, 8:17:13, 8:57:42, 8:20:34, 8:48:21, 8:34:...
$ Time24 <fct> 15:21:36, 15:44:48, 15:10:58, 15:40:08, 15:35:45, 15:24:14, 15:05:28, 15:45:57, 15:08:49, 15:36...
$ RH <dbl> 73.5, 75.4, 75.9, 74.7, 75.0, 73.7, 76.6, 75.1, 75.6, 75.1, 74.4, 75.6, 73.8, 76.6, 73.9, 76.3,...
$ Temp <dbl> 25.9, 25.6, 26.1, 25.6, 25.7, 25.8, 26.2, 25.6, 26.1, 25.7, 25.9, 25.8, 25.4, 26.2, 25.5, 26.2,...
$ PM2.5 <int> 34, 32, 59, 45, 40, 32, 42, 34, 35, 45, 36, 33, 29, 42, 46, 36, 42, 33, 35, 33, 39, 32, 39, 35,...
$ CO2 <dbl> 1096.88, 975.00, 1068.75, 975.00, 1006.25, 1984.38, 1328.13, 946.88, 1068.75, 1328.13, 1434.38,...
$ MCO2 <dbl> 1096.88, 975.00, 1068.75, 975.00, 1006.25, 1068.75, 1037.50, 946.88, 1068.75, 1021.88, 1112.50,...
$ MPM25 <dbl> 34, 32, 59, 45, 40, 32, 42, 34, 35, 45, 36, 33, 29, 42, 46, 36, 42, 33, 35, 33, 39, 32, 39, 35,...
$ t <dttm> 2019-05-21 15:21:36, 2019-05-21 15:44:48, 2019-05-21 15:10:58, 2019-05-21 15:40:08, 2019-05-21...
I have also tried
ml <- with(rs, loess(formula = y~log(x), span = 0.5))
mp <- predict(ml)
but it resulted in this error message
ml <- loess(formula = y~log(x), with(rs), span = 0.5)
Error in eval(substitute(expr), data, enclos = parent.frame()) :
argument is missing, with no default
I dont really understand where I went wrong, because any troubleshooting I've done through the internet didn't really gave me a definitive answer. If there are other methods, please do tell me.
I apologize for not giving a reproducible example, I am not far enough into learning R that I can create a random data, any help is appreciated, thanks in advance.