Filter and join data.frame based on other data.frame

Question

I have a temperature data name= "dlr_rms" which has 1208783 entries and date-time for those entries and I have to find the absolute error for that data basically I have another table called 'absolute values' which have 21072 temperature values. And I want to subtract those values from my previous data according to shared or grouped date-time.

For Example (df1):

temp  date-time                     
2     2015-07-14 16:44:01      
3     2015-07-14 16:44:01  
4     2016-08-14 16:44:02
8     2016-08-14 16:44:02
5     2017-09-14 16:44:03    
6     2017-09-14 16:44:03

df2:

absolute table    date-time
2                 2015-07-14 16:44:01
5                 2016-08-14 16:44:02 
9                 2017-09-14 16:44:03

I want the values of shared date-time like (2,3) (4,8) (5,6) to be subtracted from the same number assigned to them in the absolute value table. I also need to join different tables in order to achieve the error calculations

Desired result table

2-2=0
3-2=1

4-5= -1
8-5= 3

5-9 =-4
6-9 = -3

output of dput commands: df1:

1515434400, 1515438000, 1515452400, 1515456000, 1515459600, 1515463200, 1515466800, 1515470400, 1515474000, 1515477600, 1515481200), class = c("POSIXct", "POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA, -21072L))

When you say "shared date-time", why are the first four values not grouped together, since their times are all the same? You talk about two tables but only give one, suggesting either (a) you do not need to talk about the second table, since you've already figured out how to do the table joins, or (b) there is more inconsistency here that we aren't aware of. Please provide a [reproducible question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), to include re-usable formats for your data (e.g., `dput(head(x))` with sufficient data in each). — r2evans, Jul 05 '18 at 15:02
Also choose a more descriptive title that tells use what you are trying to do. — MrFlick, Jul 05 '18 at 15:17
The error you are calculating in your example is not `absolute`, it is just the error — roschach, Jul 05 '18 at 15:38
{df2: {`[ 'data.frame': 21072 obs. of 2 variables: $ absolute_value: num 29.8 28.6 27.7 26.5 24.6 23.9 22.2 21.2 19.7 18.9 ... $ absolte_time : POSIXct, format: "2015-07-14 17:00:00" "2015-07-14 18:00:00" "2015-07-14 19:00:00" ]`} — Rahul Tyagi, Jul 05 '18 at 18:20
df1 :{'data.frame': 1208783 obs. of 16 variables: $ latitude : chr "39.4284" "39.4284" "39.4284" "39.4284" ... $ modelRunDateTime : POSIXct, format: "2015-07-14 16:44:01" "2015-07-14 16:44:01" "2015-07-14 17:44:01" ... $ validDateTime : num 1.44e+09 1.44e+09 1.44e+09 1.44e+09 1.44e+09 ... $ temperature : num 29.8 28.6 28.6 27.4 27.2 27.7 26.2 26.5 25.8 26.1 ... $ relativeHumidity : num 58 61 61 65 66 63 69 68 71 70 ... $ wspd : num 5.3 5 4.5 3.7 4.5 4.4 3.7 2.8 3.8 3.4 ... — Rahul Tyagi, Jul 05 '18 at 18:22
These are not what you get from `dput`... They do not help the user in importing the data. Please provide **excatly** what you get from typing `dput(df1)` and `dput(df2)` into your question. Not as a comment. `dput` _not_ `str` — acylam, Jul 05 '18 at 18:34
my data has 1208783 entries when i type dput it is giving me huge chunk of numbers that are not fitting in my screen — Rahul Tyagi, Jul 05 '18 at 18:41
If that's the case, subset your dataset**s** so they are much smaller, and apply `dput` to them and copy the output to your question. Just applying `dput` to your current sample data in your question should be sufficient. — acylam, Jul 05 '18 at 18:52

acylam · Answer 1 · 2018-07-05T17:01:55.840

0

We can do this with dplyr and tidyr:

library(dplyr)
library(tidyr)

df1 %>%
  left_join(df2, by = "date_time") %>%
  mutate(absolute_error = temp-absolute)

Result:

  temp           date_time absolute absolute_error
1    2 2015-07-14 16:44:01        2              0
2    3 2015-07-14 16:44:01        2              1
3    4 2016-08-14 16:44:02        5             -1
4    8 2016-08-14 16:44:02        5              3
5    5 2017-09-14 16:44:03        9             -4
6    6 2017-09-14 16:44:03        9             -3

Data:

df1 = structure(list(temp = c(2L, 3L, 4L, 8L, 5L, 6L), date_time = structure(c(1L, 
1L, 2L, 2L, 3L, 3L), .Label = c("2015-07-14 16:44:01", "2016-08-14 16:44:02", 
"2017-09-14 16:44:03"), class = "factor")), .Names = c("temp", 
"date_time"), class = "data.frame", row.names = c(NA, -6L))

df2 = structure(list(absolute = c(2L, 5L, 9L), date_time = structure(1:3, .Label = c("2015-07-14 16:44:01", 
"2016-08-14 16:44:02", "2017-09-14 16:44:03"), class = "factor")), .Names = c("absolute", 
"date_time"), class = "data.frame", row.names = c(NA, -3L))

edited Jul 05 '18 at 17:01

answered Jul 05 '18 at 15:19

acylam

18,231
5
36
45

absolute values are in another data frame which dont have same number of rows as my previous data frame so i can not combine them – Rahul Tyagi Jul 05 '18 at 15:26
no applicable method for 'tbl_vars' applied to an object of class "c('matrix', 'double', 'numeric')" – Rahul Tyagi Jul 05 '18 at 17:19
the output of dput command are just some random numbers. should i upload that or structure of my data frame – Rahul Tyagi Jul 05 '18 at 17:44
@RahulTyagi Yes, just copy and paste exactly that dput output into your question. Just like what I have in the Data section of my answer. – acylam Jul 05 '18 at 17:45
i added the output for df1 df2 is still in progress – Rahul Tyagi Jul 05 '18 at 18:00

Filter and join data.frame based on other data.frame

1 Answers1