2

I currently use R to connect to a sql database and pull information. One particular code I use in R looks like this:

Joe<-dbGetQuery(jdbcConnection,"SELECT priv_check.protocol_id
            FROM LHS_ONCORE_PROD.sv_user_pcl_permission priv_check
            WHERE priv_check.function_name = 'CRPT-Subject Visits'
            AND priv_check.contact_id = '6017'" )

Its a query that returns a dataframe of what protocols Joe (contact id 6017) is allowed by our clinical trial system to see. This dataframe is currently 1 column and ~500 rows. Kinda looks like this:

enter image description here

What I'd love is to create a very large dataframe that returns the protocols EVERY user can see.

Lets say I have a separate dataframe with the contact id's of our users, like this:

df<-structure(list(Contact = c("John", "Ali", "Joe", "Steve"), ContactID = c(1234, 
4321, 6017, 5522)), row.names = c(NA, -4L), spec = structure(list(
    cols = list(Contact = structure(list(), class = c("collector_character", 
    "collector")), ContactID = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), delim = ","), class = "col_spec"), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

enter image description here

and instead of just returning the dataframes joe (6017) can view, it also checks john (1234) and ali (4321) and steve (5522).

I realize this query will probably take a very long time to run, and I also think I'll have to figure out how to organize it when I get the data (in terms of long vs wide, etc...) but to begin with, I dont even know how to get it to automatically loop like that. Any ideas?

If possible, I'd love the output to line studies up as much as possible and look like this: enter image description here

but thats not SUPER important

Joe Crozier
  • 944
  • 8
  • 20
  • 1
    The second part of your question is really just joining your combined data with `df` and pivoting from long to wide. See: **merge/join**, https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/q/5706437/3358272; **long-to-wide**, https://stackoverflow.com/q/11608167/3358272. – r2evans May 12 '23 at 20:00
  • 1
    What's the origin of that "separate dataframe with the contact id's" ? I'd assume it's a result of a query from that same SQL db. If yes, you could probably handle it all in SQL. – margusl May 13 '23 at 06:52
  • @margusl it is. That said, we've kinda split all our queries into files that run on different days and it is already saved as a csv elswhere. So i usually just read it in with read_csv first – Joe Crozier May 15 '23 at 13:49

1 Answers1

2

Two methods:

  1. SQL IN operator:

    (qmarks <- paste(rep("?", nrow(df)), collapse = ","))
    # [1] "?,?,?,?"
    
    allusers <- dbGetQuery(jdbcConnection,
      paste("SELECT priv_check.contact_id, priv_check.protocol_id
             FROM LHS_ONCORE_PROD.sv_user_pcl_permission priv_check
             WHERE priv_check.function_name = 'CRPT-Subject Visits'
             AND priv_check.contact_id in (", qmarks, ")"),
      params = as.list(df$ContactID) )
    

    This uses parameterized queries.

  2. If the number of users you need to query is huge, them you can opt for a two-step: upload the list of IDs to a temporary table (assuming you have the DB permissions to create a local temporary table), and query the real table inner-joining on the temp table.

    dbWriteTable(jdbcConnection, "mytemp", df, temporary = TRUE) # [1]
    allusers <- dbGetQuery(jdbcConnection,
      "SELECT priv_check.contact_id, priv_check.protocol_id
       FROM LHS_ONCORE_PROD.sv_user_pcl_permission priv_check
         INNER JOIN mytemp t ON priv_check.contact_id = t.ContactID
       WHERE priv_check.function_name = 'CRPT-Subject Visits'")
    dbExecute(jdbcConnection, "drop table mytemp")
    

    Note 1: not all DBs do "temporary" the same. SQL Server (over ODBC), for instance, does not (consistently or at all) use temporary=, instead requiring the table name to be prepended with #, so dbWriteTable(con, "#mytemp", df). Make sure you test this to make sure the tables are visible to you, are not visible to others, and are automatically cleaned up when you're done. (You might enlist the aid of a DBA or colleague to make sure it is doing as you expect.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • sorry this took me so long to reply and/or accept. Thank you for the solution, i'm sure it'll work and I'm almost there. Right now when I try to run it I get the error: " JDBC ERROR: Missing IN or OUT parameter at index:: 1" Any ideas? I can add screenshots etc... to the question if it helps. – Joe Crozier May 15 '23 at 13:47
  • According to https://stackoverflow.com/a/27342755/3358272, that error suggests that either you didn't include the `params`, or the JDBC-version of `dbGetQuery` does not support bound parameters. If you're using `RJDBC` instead (likely?), then https://www.rforge.net/RJDBC/ suggests we might need to be a little creative, perhaps `res <- do.call(dbGetQuery, c(list(hdbcConnection, "SELECT ..."), as.list(df$ContactID)))`. – r2evans May 15 '23 at 14:01
  • It is rjdbc. Ok I just tried: Access <- do.call(dbGetQuery, c(list(jdbcConnection, "SELECT priv_check.contact_id, priv_check.protocol_id FROM LHS_ONCORE_PROD.sv_user_pcl_permission priv_check WHERE priv_check.function_name = 'CRPT-Subject Visits' AND priv_check.contact_id in (", qmarks, ")"), params = as.list(Contacts$CONTACT_ID) )) and it spit back "invalid column index". Hmm, wondering if I should start a new question on here – Joe Crozier May 15 '23 at 14:06
  • probably ... and unfortunately it sounds like it is RJDBC-specific, so I hope somebody else is able to help you there, I have nothing to test it on – r2evans May 15 '23 at 14:17
  • Did you try the second option? That doesn't rely on bound parameters. – r2evans May 15 '23 at 14:18
  • I dont believe we have db permissions to create a local table. Also our db architecture is changing soon (to snowflake) and i'll have to rewrite all this code anyway. My thought was that if it was just a query of some sort and didnt rely on writing a table, it'd be less hard to change. Though thats a wild guess. I'll accept your answer above because it may help someone but I'll start a new question – Joe Crozier May 15 '23 at 14:22
  • 1
    Most users cannot create "regular" tables, but often temporary tables are allowed and you have no indication. If you aren't certain, try it and/or speak with your DBA. The second option is something I use regularly at work. – r2evans May 15 '23 at 14:35