0

I am trying to recreate the following SAS code in R

PROC SQL;
    create table counts_2018 as 
    select a.*, b.cell_no 
    from work.universe201808 a, work.selpar17 b 
    where a.newregionxx = b.lower_region2 
      and a.froempment >= b.lower_size 
      and a.froempment <= b.upper_size 
      and a.frosic07_2 >= b.lower_class2 
      and a.frosic07_2 <= b.upper_class2;
QUIT;

What this does, in effect, is assign the cell_no found in selpar17 to the data in universe201808, based on the fulfillment of all 6 conditions outlined in the code. Data which does not fulfill these conditions, and thus won't have a cell_no assigned to it, is not included in the final table.

The documentation/answers I have found so far all start with a step where the two dataframes are merged by a common variable, then an sqldf select is carried out. I do not have a common column, and thus I cannot merge my dataframes.

Parfait
  • 104,375
  • 17
  • 94
  • 125
Laura
  • 177
  • 1
  • 12
  • You should be able to just put that query into `sqldf` with minimal changes. Remove the `work.` parts, the `proc sql;` and `;quit;`, and start the query with `select ...`. So, have you tried that? Was there an error, and if so, what was it? – IceCreamToucan Dec 18 '18 at 14:39
  • Are you able to provide me the first line of code with the selecting of two dataframes? The documentation I can find only shows an argument for one dataframe – Laura Dec 18 '18 at 14:45
  • 1
    It accepts normal SQL queries like proc sql, `counts_2018 <- sqldf('select a.*, b.cell_no from universe201808 a, selpar17 b where a.newregionxx = b.lower_region2 and a.froempment >= b.lower_size and a.froempment <= b.upper_size and a.frosic07_2 >= b.lower_class2 and a.frosic07_2 <= b.upper_class2')` should work. – IceCreamToucan Dec 18 '18 at 14:47
  • Ah i was missing the ' ' . the only issue is " Error in rsqlite_send_query(conn@ptr, statement) : near "where": syntax error " – Laura Dec 18 '18 at 14:51
  • Ok, you'll have to share your data then, otherwise no one can reproduce the problem. – IceCreamToucan Dec 18 '18 at 14:53
  • The `sqldf` readme page has a lot of good examples of queries involving joins. https://cran.r-project.org/web/packages/sqldf/README.htm – IceCreamToucan Dec 18 '18 at 15:18

1 Answers1

1

Currently, you are running an implicit join between the two tables which is not advised in SQL. Per ANSI-1992 (a 25+ year specification) that made the explicit JOIN the standard way of joining relations, consider revising your SQL query accordingly.

Contrary to your statement, you in fact do have a common column between the tables as shown in your equality condition: a.newregionxx = b.lower_region2 which can serve as the JOIN condition. Even use the BETWEEN operator for concision:

new_df <- sqldf('select u.*, s.cell_no 
                 from universe201808 u
                 inner join selpar17 s 
                         on u.newregionxx = s.lower_region2 
                 where u.froempment between s.lower_size and s.upper_size 
                   and u.frosic07_2 between s.lower_class2 and s.upper_class2')

In fact, you can remove the where altogether and place all in the on clause:

...
on u.newregionxx = s.lower_region2 
and u.froempment between s.lower_size and s.upper_size 
and u.frosic07_2 between s.lower_class2 and s.upper_class2
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • What is the name of this ANSI specification? I would like to look into this further. – Laura Dec 18 '18 at 17:03
  • 1
    The industry routinely updates ANSI specifications of the SQL language, notably 1992, 1999, 2003, 2008, etc. See this answer for links on actual [docs](https://stackoverflow.com/q/1714461/1422451). Most RDBMS's adhere to 1992. – Parfait Dec 18 '18 at 17:11