[Problem Statement]
I have trouble with inserting data to DB.(About 15,000,000 records) It takes 2-3 week(estimated).. it takes too long!!! After latency analysis, I found that select query was the root cause of problem but there's not idea to fix it.
[Description]
Since its hierarchical nature, the sequence(select query followed by insert query) is needed to all tables. As child-node table increases, the select latency linearly increases.
pseudo code of my logic is as follows:
1.Fill mother-node tables (with msyql connector.executemany : batch size = 10,000)
2.Select inserted data's index and update them to dictionary, to feed child-node table
3.Insert child-node data & mother-node key(from dictionary) to child-node table
4.Looping over batch dataframes...