1

I am trying to create a subset of a data frame based on a range surrounding the values of a second data frame, I've been researching but I just cannot figure out how to go about it. I've used dummy data here as they are both large datasets with many columns.

Data Frame 1 (df1) has 50 columns, thousands of recordings at different Latitudes

Recording Latitude
BombusL 51.41
ApisM 51.67
BombusR 51.34

Data Frame 2 (df2) has several hundred towns all at different latitudes, it is significantly smaller than df1

Town Lat
Bristol 51.40
Merton 51.42
Horsham 51.33

I need a subset of df1 which only includes rows with latitudes that are within 0.01 of a latitude in df2. So the code needs to look down every row of df1 and test that number against every row of df2. The output would include only rows from df1 where the latitude value is within 0.01 range of a value in df2$Latitude.

From the example, the following lines would be included

Recording Latitude
BombusL 51.41
BombusR 51.34

I have the start of the code to do a filter that I could then run through the data frame to create the subset

LatFil <- df1$latitude %in% df2$latitude)

But I can't figure out how to enter the logical test of ± 0.01 of the value in df2$latitude

2 Answers2

4

When there is precision involved (i.e. adding or subtracting 0.01, it is a floating point number), it may be better to use comparison operators instead of fixed matching

subset(df1, (Latitude >= (df2$Lat - 0.01)) & 
         (Latitude <= (df2$Lat + 0.01)))
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Another option:

df2$Lat_hi <- df2$Lat + 0.01
df2$Lat_lo <- df2$Lat - 0.01


LatFil <- df1[df1$Latitude %in% c(df2$Lat, df2$Lat_hi, df2$Lat_lo),]
Alec B
  • 159
  • 3