Loading Multiple Tables from Multiple Schemas from MySQL to R, and Merge all the Tables

Question

Before I begin I know that these questions exist: Get an R dataframe with merged values from multiple MySQL tables and Best approach to combine multiple MySQL tables in R. But I can't use them for my problem.

I have a list of schemas (let's say 80 schemas) in a MySQL database with a specific ProductID such as shj637, jsfh748, hasd548 etc. I've added a specific table with a specific name in all those schemas as ProductINFO (so all schemas have this table). In which they all have the same column names (like ID, name, product group, etc) but different values in rows.

What I want to do is to make a loop or a function, that would load and union all ProductINFO tables from different ProductID schemas and make one data frame with all those information that contain their specific ProductIDs (as a column).

So for example if: schema A with ProductID shj637 and schema B with ProductID kly457 and 80 more! they would all have one table called ProductINFO. I want to produce:

ProductID	ID	group	otherstuff
shj637	1	Q2	ghafdj...
shj637	2	Q4	jhsdf...
shj637	3	Q1	jhfdl...
kly457	1	Q2	jshfd...
kly457	2	Q4	uiepc...
kly457	3	Q1	qapfn..
.. for all 80 ProductID schemas

This is the table structure:

CREATE TABLE `productinfo` (
`ID` int(11) DEFAULT NULL, 
`Question` text, 
`Product Answer` double DEFAULT NULL, 
`Product Code/Script (If Required)` test, 
`Product Score/Mark` double DEFAULT NULL, 
`Feedback` text, 
`ProductID` text, 
`database ID` text) 
ENGINE=MyISAM DEFAULT CHARSET=latin1

Thank you in advance!

What do the schemas table structure look like? I assume that all 80+ of them are identical. Can you run `SHOW CREATE TABLE table_name;` on one of the schema and post the result [into your question](https://stackoverflow.com/posts/68341128/edit)? — FanoFN, Jul 12 '21 at 03:08
@FaNo_FN Yess all 80+ schemas have the same table with the structure that I've added to the question. I just want to have a union of all schemas that have this table within them. — Abtin Ijadi Maghsoodi, Jul 12 '21 at 03:26
Can you post the create table syntax as text instead of image? That way I can easily copy & paste it and attempt to create the scenario. Also, I'm guessing you want something (a query maybe) that can scan through and find all the tables to include in, without having to hardcode the table names, is that correct? Probably a dynamic query? — FanoFN, Jul 12 '21 at 03:28
@FaNo_FN Thank you. Revised it. Yeap; what I want is to get all the tables with the same name and have a UNION of them in a data frame but I would still have the SchemaID or in this case ProductID and their associated rows. Exactly as you said, I don't want to hard code this procedure for all 80+ schemas with same table name. — Abtin Ijadi Maghsoodi, Jul 12 '21 at 03:44
I can probably suggest a query that will work on MySQL. Coincidentally I've been constructing dynamic queries using `PREPARED STATEMENT` which will work on most MySQL versions. I'm not familiar with `R` though, so I'm not sure know how you can use it there. If you want to try it, I can post a suggestion. — FanoFN, Jul 12 '21 at 04:00
@FaNo_FN Sure, I would like to know how to do that in MySQL as well. But my main objective is to code that problem in R. Thank you very much. — Abtin Ijadi Maghsoodi, Jul 12 '21 at 05:01
Thank you @r2evans, Can you please also see if you know how to do this: https://stackoverflow.com/questions/68376031/loading-data-from-a-dataframe-in-r-to-multiple-tables-in-mysql-with-the-same-nam — Abtin Ijadi Maghsoodi, Jul 19 '21 at 02:48

FanoFN · Answer 1 · 2021-07-12T08:18:30.717

0

In SO I've seen and learned at least one feasible method to tackle the issue with dynamic columns or tables or even databases for MySQL. The function is called PREPARED STATEMENT.

Take this query as example:

/*1*/
SET @sql := NULL;

/*2*/
SET @sql := (SELECT GROUP_CONCAT(
           CONCAT('SELECT ID, ProductID, Name, ProductGroup FROM ',table_name) 
           SEPARATOR ' UNION ALL ') FROM information_schema.tables
WHERE table_schema='my_database_name' 
/*[AND table_name ...  another condition to get only the tables you want]*/;

/*For checking purpose only. Not part of the prepared statement. Can be removed.*/
SELECT @sql;

/*3*/
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

I'll explain one by one according to the numbers:

The first step is to decide on a variable. In this example I'm using @sql and I set it as NULL just in case that the variable was used before and still hold some value.
Here we're going to set the @sql variable with a query that will be used for the execution later. The query is created from a combination of CONCAT and GROUP_CONCAT from a query over information_schema database. This is how the table names are obtained the concatenated with the final query. Note that the part /*[AND table_name .. is where you have to define the table identification. Something like:

if for example those tables have partial matching names like tablename_shj637, tablename_jsfh748, tablename_hasd548 .. then the condition could be:

AND table_name LIKE 'tablename%'

if the database (table_schema) holds all the table required including productinfo table, then the condition could be:

AND table_name != 'productinfo'

The final generated @sql would look something like this:

SELECT ID, ProductID, Name, ProductGroup FROM hasd548 
UNION ALL 
SELECT ID, ProductID, Name, ProductGroup FROM jsfh748 
UNION ALL 
SELECT ID, ProductID, Name, ProductGroup FROM shj637

You can check @sql variable value by running SELECT @sql after the second step. This will help with seeing whether the query being constructed correctly or not.

The last part is to PREPARE, EXECUTE then DEALLOCATE the prepared statement.

I've made a fiddle here : https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=be294ef5fd46ed6e04f8b03d33f34ca2 . However, the SET @sql query is a bit different because of the fiddle condition (temp db, temp tables).

edited Jul 12 '21 at 08:18

answered Jul 12 '21 at 07:06

FanoFN

6,815
2
13
33

Thank you very much @FaNo_FN that's awesome! Can I ask a question to expand on your suggestion in MySQL? Can you edit the sql code in way that I could save it in lets say for example Schema called AAPP under table name ProductInfo or a new table with this name? Thanks again! – Abtin Ijadi Maghsoodi Jul 12 '21 at 11:20
Can you post an example of a final sql code you wish to get? – FanoFN Jul 13 '21 at 00:28
The query you posted n sql is what I have in mind to get in R. However, I was wondering how you can save the output of the query in mysql. Do I have to use: INSERT INTO TBALE and then the code you suggested? How you save the output of that prepared query? – Abtin Ijadi Maghsoodi Jul 13 '21 at 01:05
To add an `INSERT` statement in the prepared statement, you can concatenate it before the `GROUP_CONCAT` section in the `@sql` variable syntax. See this fiddle example https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c2ea046e505a6d342892347eb66d6c00 – FanoFN Jul 13 '21 at 01:12
This is the error I get in MySQL: Error Code: 1064. You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'NULL' at line 1. I get this error when I execute this: /*3*/ PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt; – Abtin Ijadi Maghsoodi Jul 13 '21 at 02:19
`SET @sql := NULL; SET @sql := (SELECT CONCAT('INSERT INTO ProductInfo(ID, ProductID, Name, ProductGroup) ', GROUP_CONCAT( CONCAT('SELECT ID, ProductID, Name, ProductGroup FROM ',table_name) SEPARATOR ' UNION ALL ')) FROM information_schema.tables WHERE table_schema=(SELECT table_schema FROM information_schema.tables WHERE table_name='ProductInfo' GROUP BY table_schema) AND table_name <> 'ProductInfo'); SELECT @sql; PREPARE stmt FROM @sql; EXECUTE stmt; DEALLOCATE PREPARE stmt;` – Abtin Ijadi Maghsoodi Jul 13 '21 at 02:28
Well, it looks like the part `WHERE table_schema=(SELECT ...` onwards is incorrect. Can you tell me what is the database name and the list of tables names that you want to get for the final generated `UNION` query? `ProductInfo` table will be the destination, correct? What are the source tables? – FanoFN Jul 13 '21 at 02:37
Or can you write down an example of final query output you wish to generate? Probably something like the `UNION ALL` query in the last section of the fiddle here https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c2ea046e505a6d342892347eb66d6c00 . You can write it in the fiddle, click 'Run' button and paste the new generated fiddle link here. Never mind if there's any error on the fiddle – FanoFN Jul 13 '21 at 02:40

score 0 · Accepted Answer · answered Jul 12 '21 at 13:05

I think you can do this in R fairly directly pasting the query together with a user-created vector of database/schema names.

A manual selection from two databases "mt1" and "mt2" from a table named "quux" would be:

dbGetQuery(mycon, "select * from mt1.quux")
#       row_names  mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1     Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# 2 Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# 3    Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
dbGetQuery(mycon, "select * from mt2.quux")
#           row_names  mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# 2 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# 3           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
dbGetQuery(mycon, "(select * from mt1.quux) union all (select * from mt2.quux)")
#           row_names  mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

To do this programmatically, use

schemas <- c("mt1", "mt2")
qry <- paste(sprintf("(select * from %s.quux)", schemas), collapse = " union all ")
dbGetQuery(mycon, qry)
#           row_names  mpg cyl disp  hp drat    wt  qsec vs am gear carb
# 1         Mazda RX4 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# 2     Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# 3        Datsun 710 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
# 4    Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
# 5 Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
# 6           Valiant 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

(PS: this is intended to demonstrate using the schema.table method from one connection using a vector of known schema names. The use of select * ... is viewed by many as a little sloppy, I'm using it here for brevity.)

Setup

I'm using docker and mysql:8. From the command line (not R):

$ docker run --name some-mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=my-secret-pw -d mysql:8

In R:

mycon <- DBI::dbConnect(odbc::odbc(), driver="MySQL ODBC 8.0 ANSI Driver",
                        host="127.0.0.1", port=3306, user="root", password="my-secret-pw")
# initially, no database
dbGetQuery(mycon, "select database()") # NA?
#   database()
# 1       <NA>

dbExecute(mycon, "create schema mt1")
dbExecute(mycon, "create schema mt2")

dbGetQuery(mycon, "select schema_name from information_schema.schemata")
#          SCHEMA_NAME
# 1              mysql
# 2 information_schema
# 3 performance_schema
# 4                sys
# 5                mt1
# 6                mt2

dbExecute(mycon, "use mt1")
# [1] 0
dbGetQuery(mycon, "select database()")
#   database()
# 1        mt1
dbWriteTable(mycon, "quux", mtcars[1:3,], create = TRUE)

dbExecute(mycon, "use mt2")
# [1] 0
dbGetQuery(mycon, "select database()")
#   database()
# 1        mt2
dbWriteTable(mycon, "quux", mtcars[4:6,], create = TRUE)

Thank you very much!! Thats very interesting. I just didn't get how to do that with 80+ schemas with different names, which would have same table name. With this approach I still have to write the code for all schemas right? Or not? How you would code that for large amount of schemas? To automate the whole data loading? — Abtin Ijadi Maghsoodi, Jul 12 '21 at 14:04
Change my `schemas` variable to include the 80+ schemas, then the `paste(sprintf(..), collapse=" union all")` takes care of the rest, whether 2 or 200 schemas. You can create the `schemas` object manually (type them all in), or you can use the `"select schema_name..."` query to find them (with some filtering). Either way, I thought the answer clearly showed how to do this programmatically and *not* have to type in the query for each schema individually. — r2evans, Jul 13 '21 at 13:28

Loading Multiple Tables from Multiple Schemas from MySQL to R, and Merge all the Tables

2 Answers2

Setup