0

I have 3 data frames :

  1. Promotion : IDs that accessed my google ADs (for example) :TrackID, click datetime
  2. Website : IDs that accessed my website : TrackID, access datetime, client account
  3. Sales : Historical sales : client account, sale, sale datetime

I am trying to evaluate specific ADs to mesure their results. So I want to infer if some sale was caused by that AD.

I´ve been trying to to this :

a) Join table of Sales with Website to locate the relative TrackID of each sale, so the new "Sales table" will include the TrackID column. The question is that the TrackID access datetime must be the closest access BEFORE the Sale :

With SQL would be something like :

Select Sales.sale, Sales.account, max(Website.TrackID), max(website.datetime)
From Sales, left join Website on sales.account = website.account
where website.datetime < sales.datetime 

After that, I must identify if this website trackID is related to the ADs track ID. In SQL would be something like :

select Promotion.TrackID, sales.sale
from promotion left join sales on promotion.trackid=sales.trackid
where promotion.datetime < sales.datetime

At the end, I will have a full list of ADs, showing me which of them had some sale or not. That what I expect.

I tried to be clearly as possible, but is not simple. I´ve been reading about pandas JOIN and MERGE but on both cases, but I didn´t see them as solution for this.

BERA
  • 1,345
  • 3
  • 16
  • 36
FábioRB
  • 335
  • 1
  • 12
  • 2
    Telling us you have 3 dataframes isn't very helpful. Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions – itprorh66 Aug 11 '22 at 19:14
  • I tried to explain the DF contents and how would I do an equivalent SQL query to understand such kind of query in a DF. I will try to create an example data. – FábioRB Aug 11 '22 at 20:25

1 Answers1

0

To work with pandas tables like with database with many tables you can use pandasql

Code example:

from pandasql import sqldf

sales = pd.DataFrame() 
sales['sale'] = [1, 2, 3]
sales['account'] = ['A', 'B', 'C']
sales['datetime'] = [1, 2, 4]

website = pd.DataFrame()
website['TrackID'] = [11, 22, 33]
website['datetime'] = [1, 2, 9]
website['account'] = ['A', 'B', 'D']
    
q = '''
    SELECT s.sale, s.account, max(w.TrackID), max(w.datetime) 
    FROM sales s 
    INNER JOIN website w ON s.account=w.account 
'''

sqldf(q)
George
  • 21
  • 6