0
INSERT INTO required_data (
keyword_id, search_engine_id, date, " . $periodType . "
) VALUES (
:keyword_id, :search_engine_id, :date, TRUE
) ON DUPLICATE KEY UPDATE " . $periodType . " = TRUE, received = FALSE

This particular query is currently inserting around 2.6-2.8 million rows on a typical day, it handles that just fine. In those cases a task checks and executes this on an hourly basis. But we also have a weekly, and a monthly set of inserts. That's another very large chunk of data to insert on those occasions, when these run we have issues due to how long they take to finish.

When these take too long it locks up other tasks, I plan to break the tasks up into more manageable chunks to spread the load a little better and keep everything running when it needs to throughout the day/week/month but it's hardly ideal. If anyone knows how I could make this more efficient that would be great. This runs on a large ec2 instance.

Shardj
  • 1,800
  • 2
  • 17
  • 43
  • do you have any triggers on that table? 3 millions rows inst that much. – Juan Carlos Oropeza May 22 '18 at 14:03
  • If your table is very big you may consider partition it by date or search_engine. https://stackoverflow.com/questions/6093585/how-to-partition-a-table-by-datetime-column – Juan Carlos Oropeza May 22 '18 at 14:10
  • Since it seems like you're running statistics on search engine data, it might be a good time to consider moving to another database structure. https://dba.stackexchange.com/questions/123250/optimal-database-structure-for-fast-inserts – Mike May 22 '18 at 14:10
  • “When these take too long it locks up other tasks” sounds like you are using MyiSAM engine which indeed table locks.. Consider to switch to InnoDB which supports row locking – Raymond Nijland May 22 '18 at 15:52
  • @RaymondNijland actually already using row InnoDB. I meant it locks up the scheduled tasks processing thread. Currently it isn't multi-threaded. So if a task takes longer than an hour it screws up hourly tasks. It's badly made so I'm going to be fixing that up – Shardj May 23 '18 at 11:27

1 Answers1

0

If you can batch the data together into an import file, you can use the MySQL LOAD DATA INFILE command. It's about as fast as it's going to get.

https://dev.mysql.com/doc/refman/8.0/en/load-data.html

edit

You may be using the wrong tool. A general purpose database might not be the right tool if you have a specific task to perform on massive amounts of data.

For example, if you're summarizing log data, you can easily process 100s of GB/day on a low powered hardware in any of a number of compiled languages that directly perform file I/O.

Terry Carmen
  • 3,720
  • 1
  • 16
  • 32
  • Shouldn't be a problem, but I'm unsure if packing the data into a file just to pull it back out for an insert would be faster. I'll give it a try though so thank you. Currently it pulls a specific set of data out from another table in a large select before inserting – Shardj May 22 '18 at 14:23
  • You would need to try it to see how it performs in your situation, – Terry Carmen May 22 '18 at 14:34