I have 3 data frames :
- Promotion : IDs that accessed my google ADs (for example) :TrackID, click datetime
- Website : IDs that accessed my website : TrackID, access datetime, client account
- Sales : Historical sales : client account, sale, sale datetime
I am trying to evaluate specific ADs to mesure their results. So I want to infer if some sale was caused by that AD.
I´ve been trying to to this :
a) Join table of Sales with Website to locate the relative TrackID of each sale, so the new "Sales table" will include the TrackID column. The question is that the TrackID access datetime must be the closest access BEFORE the Sale :
With SQL would be something like :
Select Sales.sale, Sales.account, max(Website.TrackID), max(website.datetime)
From Sales, left join Website on sales.account = website.account
where website.datetime < sales.datetime
After that, I must identify if this website trackID is related to the ADs track ID. In SQL would be something like :
select Promotion.TrackID, sales.sale
from promotion left join sales on promotion.trackid=sales.trackid
where promotion.datetime < sales.datetime
At the end, I will have a full list of ADs, showing me which of them had some sale or not. That what I expect.
I tried to be clearly as possible, but is not simple. I´ve been reading about pandas JOIN and MERGE but on both cases, but I didn´t see them as solution for this.