0

[Problem Statement]

I have trouble with inserting data to DB.(About 15,000,000 records) It takes 2-3 week(estimated).. it takes too long!!! After latency analysis, I found that select query was the root cause of problem but there's not idea to fix it.

[Description]

Since its hierarchical nature, the sequence(select query followed by insert query) is needed to all tables. As child-node table increases, the select latency linearly increases.

pseudo code of my logic is as follows:

1.Fill mother-node tables (with msyql connector.executemany : batch size = 10,000)

2.Select inserted data's index and update them to dictionary, to feed child-node table

3.Insert child-node data & mother-node key(from dictionary) to child-node table

4.Looping over batch dataframes...

Sample schema

Latency analysis

SangYoon Lee
  • 11
  • 1
  • 2
  • 1
    https://stackoverflow.com/a/6768274/11683? – GSerg Jul 12 '21 at 07:11
  • IMHO, the only difference between mine and this approach is insert first - select last vs. insert-select simultaneously. I guess the total amount of thrown select query is same. Also, Fede said that this may not be a good approach if the amount of child data is known to be large. @GSerg – SangYoon Lee Jul 12 '21 at 08:30
  • Please provide the query that does that slow "insert". It should take minutes, not weeks. And the `CREATE TABLEs`. In particular, many:many table indexes are usually screwed up. – Rick James Jul 12 '21 at 20:04
  • And take the stackoverflow tutorial to get an easy 100 rep points; this will give you more ability to interact with us. – Rick James Jul 12 '21 at 20:06
  • @GSerg - That link refers to SQL Server; the OP is working with MySQL. There are enough differences between the product to be suspicious of any answers. – Rick James Jul 12 '21 at 20:08

0 Answers0