0

I have two large data frames (nrow = 20,000), with common variables in each (X, Y and Z). Each data frame shares some unique values of one variable (Y), but the values of X and Z differ. What I would like to do is filter one data frame (the larger of the two (df2)), by the range of values of X in the smaller data frame (df1) for each unique value of Y. So, for example, if the range of X values in df1 for a unique Y value of 1 is 0.02 - 0.08, then I would like to obtain all values of X and Z from df2 where the value of X falls within that range for the given value of Y. I would like to make a new data frame containing these filtered values.

Here is some sample code and data to use an example

# GENERATE SOME DATA
 set.seed(5)
    df1 <- data.frame(X = round(runif(20, 0.2, 0.3),4), Y = rep(c(4, 5, 6, 7, 8), each = 4), Z = round(runif(20),3))
    df2 <- data.frame(X = round(runif(50, 0.1, 0.6),4), Y = rep(seq(1,10,1), each = 5), Z = round(runif(50),3))  

## ESTABLISH THE RANGE OF X VALUES FOR EACH UNIQUE Y VALUE IN THE DATA FRAME OF INTEREST (df1)

library(dplyr)

df3 <- df1 %>%
  group_by(Y) %>%
  mutate(minX=min(X), maxX=max(X)) %>%
  ungroup() %>%
  distinct(Y, minX, maxX)

## FILTER df2 BY THE RANGE OF Y VALUES IN df1

df4<- df2 %>%  filter(Y %in% df1$Y)

So now I have df2 filtered to give me only the range of Y values which are common to both data frames (df4). What I need to do here is to create a new data frame from df4 which contains only the values of X which are within the range of X values in df1 for each value of Y (df3).

Any help greatly appreciated.

A.Benson
  • 465
  • 1
  • 6
  • 16
  • can you please check your examples. It is giviing some rrors – akrun Apr 24 '20 at 00:25
  • Thanks. There was a missing bracket from the first line of code. Should be correct now – A.Benson Apr 24 '20 at 00:27
  • Also, please consider `set.seed` for reproducibilty – akrun Apr 24 '20 at 00:28
  • This has a fundamental flaw: equality of floating-point works a lot but fails without alerting you. For instance, see `sqrt(3^3) == 3` (false). This is due to [R FAQ 7.31](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f) and IEEE-754 https://stackoverflow.com/questions/9508518/why-are-these-numbers-not-equal. – r2evans Apr 24 '20 at 02:34

0 Answers0