5

I am using a code that fetch 100 items from a website and insert it into the database if the item exists it updates that record. It runs as a cron job every minute. It runs upto 50000 products.

The script works good and products are imported successfully. The problem now is the server and MySql load. My server support said that it takes almost 50% of the resources and they need to disable that script.

I tried google to check for a solution and find that Load file is better choice than mysql_query insert and updates. So I am planning to write all queries in a txt file on the server and once completed import it to db using load file.

Is it a good method? or it is also taking the same server load? I want to know the expert's decision before proceeding. Please help. Thanks

hakre
  • 193,403
  • 52
  • 435
  • 836
Vasanthan.R.P
  • 1,277
  • 1
  • 19
  • 46
  • 2
    Have you got the appropriate indexes? – Ed Heal Feb 01 '13 at 09:01
  • 1
    Stop using a VM or shared web space from a crappy host that overloads boxes. Paying extra for a dedicated server or some decent non-budget host web space will be worth its weight in gold for you. – Dave Feb 01 '13 at 09:06
  • Does you cron job take long to run, i.e. is the 50% load just a quick spike? If so, have you tried using 'nice' (coming from unix here, Windows must have something like that) – KevInSol Feb 01 '13 at 09:19
  • you can define unique index on distinct columns and use "ON DUPLICATE KEY" eg: INSERT INTO table (col1,col2,count) VALUES (2,3,10) ON DUPLICATE KEY UPDATE count = 10 .. this should help bring down the load – Jaspal Singh Feb 01 '13 at 09:27
  • For using "ON DUPLICATE KEY" do I need to create that field as unique. Because for now I have one primary key and 2 index keys. – Vasanthan.R.P Feb 01 '13 at 09:34
  • Are you really loading `100 * 50k items/min = 83k items/sec`? That a lot of items. If each of these is just 100 bytes long and you neglect the overhead, you're constantly using 66% of a 100 MBit line. Am I missing something here? – Carsten Feb 01 '13 at 10:28
  • Yes! you will have to create unique index on the combination of fileds you feel should be unique. Then, when same entry will try to insert, it will cause mysql to run the on duplicate key clasue. Typically u could update the "last_updated_time" coulmn in the on duplicate key clause – Jaspal Singh Feb 01 '13 at 10:29
  • @carsten I belive what he meant was, he has to check againseet 50k entries in the db and then make 100 inserts or updates. – Jaspal Singh Feb 01 '13 at 10:30
  • @Jaspal If I do so I don't have to change it to "Load file" method? – Vasanthan.R.P Feb 01 '13 at 12:50
  • I am not sure about that but you can give it a try. It should help you ease the load. I suggest you take a copy of db on your local system and try this approach, refine they keys and then do it on the server. The final call is yours to take. Choose the approach you think is best – Jaspal Singh Feb 01 '13 at 12:58
  • Can you post your current SQL? Is this 100 separate statements you run or just a single statement. Also db schema too including database type (MyISAM or InnoDB) – cnizzardini Apr 25 '13 at 20:50

1 Answers1

1

It would be helpful to know all the variables at play in your scenario. Obviously you do not want to disclose any sensitive information about your setup, but without knowing more parameters of your situation it is hard to give an appropriate answer. Any time the issue of MySQL performance comes up you have to evaluate everything you are doing, not just the script that you wrote to run on a cron every minute (this in itself at first glance seems like it could be optimized or re-factored to run a different way). Some things that would be good to know are: - What type of hosting environment are you in, ie shared, cloud etc...? - What kind of traffic does your site get every day, hour, min etc...? - How many web servers are you running? - How many DB servers do you have running? How many masters and slaves? - Do you have load balancing? - Are you using any type of caching like Memcache?

It would also be helpful to know why you are doing that many updates every minute? If you have no other choice then you may want to consider dedicating resources to just that procedure and running a cron once a day in the middle of the night to sync your DBs up.

This all may be overkill, you will be able to gain a lot of performance if your DB tables are normalized correctly and your indexes are set correctly. A very general rule of thumb is make sure the columns in your DB that are being queried in your "WHERE" clause have indexes on them.

Hopefully this helps.

Dropzilla
  • 502
  • 3
  • 14