1

Part of my project involves storing and retrieving loads of ips in my database. I have estimated that my database will have millions of ips within months of starting the project. That been the case I would like to know how slow simple queries to a big database can get? What will be the approximate speeds of the following queries:

SELECT * FROM table where ip= '$ip' LIMIT 1
INSERT INTO table(ip, xxx, yyy)VALUES('$ip', '$xxx', '$yyy')

on a table with 265 million rows?

Could I speed query speeds up by having 255^2 tables created that would have names corresponding to all the 1st two numbers of all possible ipv4 ip addresses, then each table would have a maximum of 255^2 rows that would accommodate all possible 2nd parts to the ip. So for example to query the ip address "216.27.61.137" it would be split into 2 parts, "216.27"(p1) and "61.137"(p2). First the script would select the table with the name, p1, then it would check to see if there are any rows called "p2", if so it would then pull the required data from the row. The same process would be used to insert new ips into the database.

If the above plan would not work what would be a good way to speed up queries in a big database?

aldux
  • 2,774
  • 2
  • 25
  • 36
  • 2
    Read about `INDEX`s in MySQL. – PP. Jul 11 '13 at 21:06
  • 2
    Using a string representation of an IPv4 address is going to be a lot slower than using an `UNSIGNED INT`. For IPv6 compatibility, you're stuck with strings, and they'll be much longer. Sharding isn't a bad idea here, but only if you'll have insertion contention. It's worth noting that Postgres supports [IP addresses as a native column type](http://www.postgresql.org/docs/9.2/static/datatype-net-types.html). – tadman Jul 11 '13 at 21:08
  • mysql is not the only *open source relational database management system* –  Jul 11 '13 at 21:09

3 Answers3

1

The answers to both your questions hinge on the use of INDEXES.

If your table is indexed on ip your first query should execute more or less immediately, regardless of the size of your table: MySQL will use the index. Your second query will slow as MySQL will have to update the index on each INSERT.

If your table is not indexed then the second query will execute almost immediately as MySQL can just add the row at the end of the table. Your first query may become unusable as MySQL will have to scan the entire table each time.

The problem is balance. Adding an index will speed the first query but slow the second. Exactly what happens will depend on server hardware, which database engine you choose, configuration of MySQL, what else is going on at the time. If performance is likely to be critical, do some tests first.

0

Before doing any of that sort, read this question (and more importantly) its answers: How to store an IP in mySQL

It is generally not a good idea to split data among multiple tables. Database indexes are good at what they do, so just make sure you create them accordingly. A binary column to store IPv4 addresses will work rather nicely - it is more a question of query load than of table size.

Community
  • 1
  • 1
Hazzit
  • 6,782
  • 1
  • 27
  • 46
0

First and foremost, you can't predict how long will a query will take, even if we knew all information about the database, the database server, the network performance and another thousands of variables.

Second, if you are using a decent database engine, you don't have to split the data into different tables. It knows how to handle big data. Leave the database functionality to the database itself.

There are several workarounds to deal with large datasets. Using the right data types and creating the right indexes will help a lot.

When you begin to have problems with your database, then search for something specific to the problem you are having.

There are no silver bullets to big data problems.

aldux
  • 2,774
  • 2
  • 25
  • 36