I'm very well-versed in SQL, and an absolute novice in R. Unfortunately, due to an update in company policy, we must use Athena to run our SQL queries. Athena is weak, so despite having a complete/correct SQL query, I cannot run it to manipulate my large, insurance-based dataset. I have seen similar posts, but haven't managed to crack my own problem trying to utilize the methodologies provided. Here are the details:
- After running the SQL block in R (using a connection string), I have a countrywide data block denoted CW_Data in R
- Each record contains a policy with a multitude of characteristics (columns) such as the Policy_Number, Policy_Effective_Date, Policy_Earned_Premium
- Athena breaks down when I try add two columns based on the already-existing ones
- Namely, I want to left join such that I can obtain a new columns for Policy_Prior_Year_Earned_Premium and Policy_Second_Prior_Year_Earned_Premium
Per the above, I know I need to add columns such that, for a given policy, I can find the record where the Policy_Number=Policy_Number and Policy_Effective_Date = Policy_Effective_Date-1 or Policy_Effective_Date-2 years. This is quite simple in SQL, but I cannot get it in R for the life of me.
Here is the (watered-down) left join I attempted in SQL using CTEs that breaks Athena (even if the SQL is run via R):
All_Info as (
Select
PC.Policy_Number
,PC.Policy_Effective_Date
,PC.Policy_EP
from Policy_Characteristics as PC
left join Almost_All_Info as AAI
on AAI.Policy_Number = PC.Policy_Number
and AAI.Policy_Effective_Date = date_add('year', -1, PC.Policy_Effective_Date)
left join All_Segments as AST
on AST.Policy_Number = PC.Policy_Number
and AST.Policy_Effective_Date = date_add('year', -2, PC.Policy_Effective_Date)
Group by
PC.Policy_Number
,PC.Policy_Effective_Date
,PC.Policy_EP