Background
I am developing a web platform for web applications. One of the key functions of this platform is user management. Therefore I have implemented a login system that enables registered users to log into the platform and the application. This is implemented using a MySQL database which contains login credentials, userids, contact information, etc...
The Problem
The issue that I am running in to is with the users table. This table can have the potential to be quite large. So if an admin wants to edit the information for a specific user, depending on where the user is in the list, the admin will have to scroll through potentially thousands of records to find the correct one. I am looking for is an algorithm that will split this up into groups of about size N and display those groups as ranges that the admin can click on: #0-B, C-F, G-L, M-R, etc....
Research
This is the research that I have conducted so far:
Database partition - Better done by PHP or MySQL?
1 very large table or 3 large table? MySQL Performance
With partitioning, there are many articles I found on the web to partition a database table, but nothing addresses what I'm trying to do.
What I have done thus far
I have written a program that runs offline from the web server that returns the counts for each bucket 0-9 and A-Z. For counts less than the threshold, it combines the buckets into a range and saves that to the database. So if total count of buckets 0-9 and A, B are less than the threshold, then the range becomes #0-B. I'm having problems trying to figure out how to get a range from a bucket if the size of the bucket is greater than the threshold.
I have considered the following solutions:
- A recursive function that keeps drilling down the usernames until it's within the threshold range. This would be ideal if I could get it working.
- Since this is running offline, another solution that I am entertaining is loading the entire bucket into memory and splitting it up from there. The drawback for this is the potential memory usage.
- Another idea that I'm entertaining is a variation on #2 which is load parts of the bucket into memory and process it. This will reduce the memory usage to more or less a fixed amount, but it will require more time to process.
EDIT 5-11/2020:
Based on the one answer and the comments, I could have a search field that would use JSON to populate the list when the number of entries is narrowed down to something below the threshold value (That is a good idea). But, I do have code which I will share with you:
This code works, but with the test data, it creates ~17,500 entries in the partition table, and there's quite a few entries that have zero counts.
function part_recurse2($prefix)
{
global $CONFIGVAR;
global $charList;
global $charCount;
$list = dbCountScan($prefix);
for ($i = 0; $i < count($list); $i++)
{
$char = substr($charList, $i, 1);
if ($list[$i] >= $CONFIGVAR['user_partition_max_size']['value'])
{
part_recurse2($prefix . $char);
}
else
{
writeRange($prefix . $char, $prefix . $char, $list[$i],
substr($prefix . $char, 0, 1));
}
}
}
And the results from the database...
mysql> select sum(`count`) from userpart;
+--------------+
| sum(`count`) |
+--------------+
| 100004 |
+--------------+
1 row in set (0.01 sec)
mysql> select count(*) from userpart where count = 0;
+----------+
| count(*) |
+----------+
| 1139 |
+----------+
1 row in set (0.01 sec)
This code partially works, but the counts do not add up, and there are zeros in it as well. The correct user count is 100,004 but the below function produces a total count that is 105,234 which is 5,230 more entries in the users table than there actually is. The nice thing about this version is that it will combine ranges until the threshold is met. This is what I would like to get working.
function part_recurse($prefix)
{
global $charCount;
global $charList;
global $CONFIGVAR;
$list = dbCountScan($prefix);
$acc = 0;
$start = substr($charList, 0, 1);
for ($i = 0; $i < count($list); $i++)
{
if ($list[$i] == 0) continue;
if ($list[$i] > $CONFIGVAR['user_partition_max_size']['value'])
{
// If the number of entries > threshold...
if ($i == 0)
{
// Only for the first bucket.
$start = substr($charList, 1, 1);
}
else
{
// All other entries.
if ($i >= $charCount - 1)
{
// Last entry
$end = substr($charList, $i - 1, 1);
$acc += $list[$i];
writeRange($prefix . $start, $prefix . $end, $acc,
substr($prefix . substr($charList, $i, 1), 0, 1));
$acc = 0;
}
else
{
$end = substr($charList, $i - 1, 1);
writeRange($prefix . $start, $prefix . $end, $acc,
substr($prefix . substr($charList, $i + 1, 1), 0, 1));
$acc = 0;
$start = substr($charList, $i, 1);
}
}
part_recurse($prefix . substr($charList, $i, 1));
}
else
{
if (($acc + $list[$i]) >= $CONFIGVAR['user_partition_max_size']['value'])
{
$end = substr($charList, $i - 1, 1);
writeRange($prefix . $start, $prefix . $end, $acc,
substr($prefix . substr($charList, $i, 1), 0, 1));
$start = substr($charList, $i, 1);
$acc = $list[$i];
}
else
{
$acc += $list[$i];
}
}
}
// Write the final entry.
if ($acc > 0)
{
$end = substr($charList, $charCount - 1, 1);
$bucket = substr($prefix . substr($charList, $i, 1), 0, 1);
writeRange($prefix . $start, $prefix . $end, $acc, $bucket);
}
}
And the database results for it:
mysql> select sum(`count`) from userpart;
+--------------+
| sum(`count`) |
+--------------+
| 105234 |
+--------------+
1 row in set (0.00 sec)
mysql> select count(*) from userpart where count = 0;
+----------+
| count(*) |
+----------+
| 316 |
+----------+
1 row in set (0.00 sec)
The correct number of entries is 100,004 with no zero counts. I will keep refining the code, but if someone sees something that I am doing wrong, please enlighten me. The off book functions have the following properties:
dbCountScan($prefix): This function steps through the characters 0-9 and A-Z using SELECT COUNT(*) FROM USERS WHERE username
like '?%'; where ? is a concatenation of $prefix and the current letter in the for loop.
WriteRange($start, $end, $count, $bucket): This function writes the range to the database. $start and $end is the start and end of the range. $count is the number of entries in the range. And $bucket is the top level bucket (0-9, A-Z) that this range belongs to.
And here's the database table userpart:
CREATE TABLE `userpart` (
`rangest` varchar(16) NOT NULL COMMENT 'Database query starting range.',
`rangeed` varchar(16) NOT NULL COMMENT 'Database query ending range.',
`count` int unsigned NOT NULL COMMENT 'Count in range.',
`bucket` varchar(5) DEFAULT NULL COMMENT 'Primary bucket.'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='Table that is used to partition usernames.'