0

I am working with the R programming language. I am trying to rename all the variables in a data frame by adding "_a" to the end of each variable. I figured out how to do this with the "dplyr" library (for data frames in the global environment):

library(dplyr)
library(dbplyr)

var1 = rnorm(100,10,10)
var2 = rnorm(100,10,10)
var3 = rnorm(100,10,10)

my_data = data.frame(var1,var2,var3)

df = my_data %>% rename_all(paste0, "_a")

Problem: My actual data frame is on a database which I access using "RODBC SQL" commands, for example:

library(RODBC)
library(sqldf)

con = odbcConnect("some name", uid = "some id", pwd = "abc")

sample_query = sqlQuery(con, "select distinct * from df")

What I tried so far: Using the "dbplyr" library, I "extract" the SQL code performed above:

 con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")

df <- copy_to(con, my_data)

final =  df %>% 
 rename_all(paste0, "_a")  %>% show_query()

#results

<SQL>
SELECT `var1` AS `var1_a`, `var2` AS `var2_a`, `var3` AS `var3_a`
FROM `my_data`

My Question: The actual table I am using ("my_data") contains many columns and is on a database. I would like to rename all the columns (e.g. add "_a" at the end of each column), but I can not do this until I figure how to pass the SQL statement through sqlQuery() . This means that I am unable to get the entire list of columns all at once and rename them all at once.

I could manually do this, e.g.

sample_query = sqlQuery(con, "SELECT `var1` AS `var1_a`, `var2` AS `var2_a`, `var3` AS `var3_a` etc etc etc `varfinal` AS `varfinal_a`
    FROM `my_data` ")

But I am looking for a way to do this automatically.

Is it possible to write something like this ?

#not sure if this is correct
 sample_query = sqlQuery(con, "rename_all(paste0, "_a")") 

Thanks!

stats_noob
  • 5,401
  • 4
  • 27
  • 83

1 Answers1

1

There may be better alternatives, but you could build a query string as follows:

query <- paste(
  "SELECT",
  paste(
    names(my_data),
    "AS",
    paste0(names(my_data), "_a"),
    collapse = ", "
  ),
  "FROM my_data"
)

Then query will give you:

[1] "SELECT var1 AS var1_a, var2 AS var2_a, var3 AS var3_a FROM my_data"
  

Which can be used in your sqlQuery statement:

sample_query = sqlQuery(con, query)
Ben
  • 28,684
  • 5
  • 23
  • 45
  • @ Ben: Thank you so much for your answer! I will try this on Monday and let you know if it worked! – stats_noob Dec 11 '21 at 04:05
  • Just a question: is the "dbplyr" library in R able to accept "dplyr style statements" and run them over on tables located on a server? Am I understanding this correctly? – stats_noob Dec 11 '21 at 04:06
  • 1
    Overall, there are 2 common methods of connection to SQL databases, both using ODBC drivers: the `RODBC` package and the `DBI` system (using `dplyr`, `dbplyr`, and `odbc`). Both create connections using credentials and useful. The `RODBC` is simpler especially if you know what SQL queries you want to use - it sounded like you wanted to pursue this approach... – Ben Dec 11 '21 at 15:20
  • 1
    If you like `tidyverse` functions and want to use `dplyr` verbs, you might want to use `dbplyr` (`DBI`). This is also likely faster for reading data from the database. In this case, you can define a table as if it were a data.frame in the R workspace. And you can use `dplyr` grammar. It is relatively easy to work with data on the database server. Taking this approach, you can use `dbConnect` with username and password, then use `tbl` function with the connection made and reference the table. You can use `dplyr` verbs to examine, filter, mutate data without knowledge of SQL. – Ben Dec 11 '21 at 15:26
  • 1
    Note that this approach is translating the code into SQL statements and executing in the database on the remote server (not on your local computer). It will be lazy and not pull data unless you ask for it or do something like `collect` to pull the data. – Ben Dec 11 '21 at 15:30
  • 1
    So, above you used `dbConnect` and made an example in memory. Could you use `dbConnect` and connect to your actual database/server? If so, you could try doing that, using `tbl` and then using `dplyr` verbs to rename columns in a `tidyverse` approach. – Ben Dec 11 '21 at 15:32
  • @ Ben: thank you so much for your clarifications! I really appreciate it! – stats_noob Dec 12 '21 at 00:19
  • I posted another related question over here https://stackoverflow.com/questions/70313117/r-understanding-how-dbplyr-works - I think you might have answered some of the questions I had... can you please take a look at this if you have time? (E.g., i use the tbl() command over there) – stats_noob Dec 12 '21 at 00:20
  • I am still struggling to understand this basic idea: can the dbplyr library take statements written in "dplyr commands", directly execute them on large tables located on a server... and then either store the results of these commands in a new table on the server, or in a new table within the global environment? – stats_noob Dec 12 '21 at 00:23