1

Is there a function that, for each value in dataSet_1, do a search in a dataSet_2 based in more than one condition? It's like in SQL, for each value, do a new search.

SELECT
value1Table1,
value2Table1,
(
   SELECT MIN(value) FROM table_2 AS table_2
    WHERE table_2.value1 = table_2.value1 AND table_1.Date > table_2.Date 
) AS value1table2, 
value3Table1
FROM table_1

Changed from MAX to MIN and < for > the have a query for the example below:

I'll try to explain a little more. Normally I use something like this in cases where I have a date (factDate) and for each user (that have your own different date) I need to search the first event that happened after the factDate.

Example:

Table_1

|User|TargetEventDate|NextEventDate| <- Value that I want
|A   |2018-04-17     |2018-04-18   | <- Value to Find
|B   |2018-04-14     |2018-04-15   | <- Value to Find

Table_2

|User|DateEvent |
|A   |2018-04-19|
|A   |2018-04-18| <- Value that I want for user A
|A   |2018-04-17| <- Value used in Table_1 for user A
|A   |2018-04-13|
|A   |2018-03-10|
|B   |2018-04-17|
|B   |2018-04-15| <- Value that I want for user B
|B   |2018-04-14| <- Value user in Table_1 for user B
  • Can you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5965451#5965451)? It'll make it easier to show working code based on your data. To filter a dataset based on conditions in another datasets you can use filters and [left_join](https://dplyr.tidyverse.org/reference/join.html) from dplyr, or [merge](http://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html) in base R. – Paul Rougieux Apr 19 '18 at 14:01
  • @PaulRougieux I don't believe this query can be written as a simple left join, because of the correlated subquery in the `SELECT` clause. – Tim Biegeleisen Apr 19 '18 at 14:01
  • 1
    `data.table` has non-equi joins. – Gregor Thomas Apr 19 '18 at 14:14
  • @TimBiegeleisen It works in a dataframe that are in memory in R like a SQL, is that right? – Gabriel Fiorelli Apr 19 '18 at 15:13
  • @GabrielFiorelli I don't know if your comment belongs here, or under my answer. In any case, to use `sqldf`, yes, the data has to be in memory in R. But this would also be the case with any other R based solution you get here. Given that you have already expressed your logic in SQL, `sqldf` is something I would definitely consider. – Tim Biegeleisen Apr 19 '18 at 15:15
  • @TimBiegeleisen. Sorry. Posted in the wrong place. I'm trying to use sqldf and appear to works for this kind of problem. – Gabriel Fiorelli Apr 19 '18 at 15:33

1 Answers1

1

I don't see a way of rewriting your query into something simpler, because of the WHERE clause in the correlated subquery. But, the good news is that R has a package called sqldf which can actually perform regular SQL queries.

library(sqldf)

query <- "SELECT t1.value1Table1, t1.value2Table1,
         (SELECT MAX(value) FROM table_2 t2
         WHERE t1.value1 = t2.value1 AND t1.Date < t2.Date) AS value1table2,
             t1.value3Table1
         FROM table_1"

result <- sqldf(query)

I assume here that table_1 and table_2 are actual data frames which have columns corresponding to the columns to which the query refers.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360