I have a php script which iterates through a JSON file line by line (using JsonMachine), checks each line for criteria (foreach); if criteria are met it checks if it's already in a database, and then it imports/updates (MYSQL 8.0.26). As an example last time this script ran it iterated through 65,000 rows and imported 54,000 of them in 24 seconds.
Each JSON row has a UUID as unique key, and I am importing this as a VARCHAR(36). I read that it can be advantageous to store the UUIDs as BINARY(16) using uuid_to_bin and bin_to_uuid so I coded the script to store the UUID as binary and the read php scripts to unencode back to UUID, and the database fields to BINARY(16).
This worked functionally, but the script import time went from 24 seconds to 30 minutes. The server was not CPU-bound during that time, running at 25 to 30% (normally <5%).
The script without uuid conversion runs at about 3,000 lines per second, using uuid conversion it runs at about 30 lines per second.
The question: can anyone with experience on bulk importing using uuid_to_bin comment on performance?
I've reverted to native UUID storage, but I'm interested to hear others' experience.
EDIT with extra info from comments and replies:
- The UUID is the primary key
- The server is a VM with 12GB and 4 x assigned cores
- The table is 54,000 rows (from the import), and is 70MB in size
- Innodb buffer pool size is not changed from default, 128MB: 134,217,728