0

I want to subset a data frame (with millions of data rows) thousands of times, using values in two columns in another data frame. Currently I was using the example provided by Akrun

     subset(df1, (Latitude >= (df2$Lat - 0.01)) & (Latitude <= (df2$Lat + 0.01)))

However, this seems to return all of the data that matches any of the rows in the second data frame. How can I adjust this so that it takes a third column from the second data frame as a name for each row subset pair?

Reference; Subset data frame based on range of values in second data frame

Dylan Egan
  • 11
  • 3
  • Can you make a minimal reproducible example with `df1` and `df2`? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Andrea M Jun 20 '22 at 09:46

1 Answers1

0
# Subsetted data
df_sub <- subset(df1, (Latitude >= (df2$Lat - 0.01)) & (Latitude <= (df2$Lat + 0.01)))
# Names of third column
towns <- df2$Town[(df1$Latitude >= (df2$Lat - 0.01)) & (df1$Latitude <= (df2$Lat + 0.01))]

df_out <- cbind(df_sub, towns)
Julien
  • 1,613
  • 1
  • 10
  • 26
  • Thanks for your advice @Julien. However, this returns *arguments imply differing number of rows: 449, 2335850* – Dylan Egan Jun 21 '22 at 12:49
  • @DylanEgan To know which code produce this error, run only `(df1$Latitude >= (df2$Lat - 0.01)) & (df1$Latitude <= (df2$Lat + 0.01))` – Julien Jun 21 '22 at 12:59
  • I tried that and it returns a list with FALSE upto 384, (so the same number as the number of subsets), and then NA, so it looks like it's not going through the whole dataframe. It also has *longer object length is not a multiple of shorter object length* – Dylan Egan Jun 21 '22 at 14:26
  • @DylanEgan This line of code was copied from the answer in the question that you linked https://stackoverflow.com/a/67305902/8806649 in your OP – Julien Jun 21 '22 at 15:10
  • I'm really sorry to be bothering you – Dylan Egan Jun 21 '22 at 15:36
  • I adapted it to my code, so it was (dataframe$system_time_stamp>= (3columns$col1)) & (datframe$system_time_stamp <= 3columns$col2)) – Dylan Egan Jun 21 '22 at 15:44
  • @DylanEgan So is your problem solved ? – Julien Jun 21 '22 at 16:39
  • I was using all of my own adaptation in the code so I still have all of the same problems – Dylan Egan Jun 21 '22 at 16:48
  • I am still having the same problems. Although, I think I might have identified the problem in that it seems to return 1 to 2 data points when subsetting (*there should be ~187 for half (192) and ~375 for the other half*) – Dylan Egan Jun 21 '22 at 22:51
  • I reposted my question with more details. It can be found at https://stackoverflow.com/questions/72712420/how-to-subset-data-frame-with-another-data-frames-columns-and-give-name-to-subs – Dylan Egan Jun 22 '22 at 08:53