-2

I have a temperature data name= "dlr_rms" which has 1208783 entries and date-time for those entries and I have to find the absolute error for that data basically I have another table called 'absolute values' which have 21072 temperature values. And I want to subtract those values from my previous data according to shared or grouped date-time.

For Example (df1):

temp  date-time                     
2     2015-07-14 16:44:01      
3     2015-07-14 16:44:01  
4     2016-08-14 16:44:02
8     2016-08-14 16:44:02
5     2017-09-14 16:44:03    
6     2017-09-14 16:44:03  

df2:

absolute table    date-time
2                 2015-07-14 16:44:01
5                 2016-08-14 16:44:02 
9                 2017-09-14 16:44:03

I want the values of shared date-time like (2,3) (4,8) (5,6) to be subtracted from the same number assigned to them in the absolute value table. I also need to join different tables in order to achieve the error calculations

Desired result table

2-2=0
3-2=1

4-5= -1
8-5= 3

5-9 =-4
6-9 = -3

output of dput commands: df1:

1515434400, 1515438000, 1515452400, 1515456000, 1515459600, 1515463200, 1515466800, 1515470400, 1515474000, 1515477600, 1515481200), class = c("POSIXct", "POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA, -21072L))

Rahul Tyagi
  • 19
  • 1
  • 6
  • 2
    When you say "shared date-time", why are the first four values not grouped together, since their times are all the same? You talk about two tables but only give one, suggesting either (a) you do not need to talk about the second table, since you've already figured out how to do the table joins, or (b) there is more inconsistency here that we aren't aware of. Please provide a [reproducible question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), to include re-usable formats for your data (e.g., `dput(head(x))` with sufficient data in each). – r2evans Jul 05 '18 at 15:02
  • 1
    Also choose a more descriptive title that tells use what you are trying to do. – MrFlick Jul 05 '18 at 15:17
  • The error you are calculating in your example is not `absolute`, it is just the error – roschach Jul 05 '18 at 15:38
  • {df2: {`[ 'data.frame': 21072 obs. of 2 variables: $ absolute_value: num 29.8 28.6 27.7 26.5 24.6 23.9 22.2 21.2 19.7 18.9 ... $ absolte_time : POSIXct, format: "2015-07-14 17:00:00" "2015-07-14 18:00:00" "2015-07-14 19:00:00" ]`} – Rahul Tyagi Jul 05 '18 at 18:20
  • df1 :{'data.frame': 1208783 obs. of 16 variables: $ latitude : chr "39.4284" "39.4284" "39.4284" "39.4284" ... $ modelRunDateTime : POSIXct, format: "2015-07-14 16:44:01" "2015-07-14 16:44:01" "2015-07-14 17:44:01" ... $ validDateTime : num 1.44e+09 1.44e+09 1.44e+09 1.44e+09 1.44e+09 ... $ temperature : num 29.8 28.6 28.6 27.4 27.2 27.7 26.2 26.5 25.8 26.1 ... $ relativeHumidity : num 58 61 61 65 66 63 69 68 71 70 ... $ wspd : num 5.3 5 4.5 3.7 4.5 4.4 3.7 2.8 3.8 3.4 ... – Rahul Tyagi Jul 05 '18 at 18:22
  • These are not what you get from `dput`... They do not help the user in importing the data. Please provide **excatly** what you get from typing `dput(df1)` and `dput(df2)` into your question. Not as a comment. `dput` _not_ `str` – acylam Jul 05 '18 at 18:34
  • my data has 1208783 entries when i type dput it is giving me huge chunk of numbers that are not fitting in my screen – Rahul Tyagi Jul 05 '18 at 18:41
  • If that's the case, subset your dataset**s** so they are much smaller, and apply `dput` to them and copy the output to your question. Just applying `dput` to your current sample data in your question should be sufficient. – acylam Jul 05 '18 at 18:52

1 Answers1

0

We can do this with dplyr and tidyr:

library(dplyr)
library(tidyr)

df1 %>%
  left_join(df2, by = "date_time") %>%
  mutate(absolute_error = temp-absolute)

Result:

  temp           date_time absolute absolute_error
1    2 2015-07-14 16:44:01        2              0
2    3 2015-07-14 16:44:01        2              1
3    4 2016-08-14 16:44:02        5             -1
4    8 2016-08-14 16:44:02        5              3
5    5 2017-09-14 16:44:03        9             -4
6    6 2017-09-14 16:44:03        9             -3

Data:

df1 = structure(list(temp = c(2L, 3L, 4L, 8L, 5L, 6L), date_time = structure(c(1L, 
1L, 2L, 2L, 3L, 3L), .Label = c("2015-07-14 16:44:01", "2016-08-14 16:44:02", 
"2017-09-14 16:44:03"), class = "factor")), .Names = c("temp", 
"date_time"), class = "data.frame", row.names = c(NA, -6L))

df2 = structure(list(absolute = c(2L, 5L, 9L), date_time = structure(1:3, .Label = c("2015-07-14 16:44:01", 
"2016-08-14 16:44:02", "2017-09-14 16:44:03"), class = "factor")), .Names = c("absolute", 
"date_time"), class = "data.frame", row.names = c(NA, -3L))
acylam
  • 18,231
  • 5
  • 36
  • 45