Generating a Sequence ID Value in MySQL to Represent a Revision ID Based Naming Convention
I used MySQL 5.5.32
to develop and test this solution. Be sure to review the bottom section of my solution for a few homework assignments for future consideration in your overall design approach.
Summary of Requirements and Initial Comments
A external script writes to a document history table. Meta information about a user submitted file is kept in this table, including its user assigned name. The OP requests a SQL update statement or procedural block of DML operations that will reassign the original document name to one that represents the concept of a discrete REVISION ID
.
- The original table design contains a independent primary key:
ID
- An implied business key also exists in the relationship between
DOCUMENT_ID
(a numerical id possibly assigned externally by the script itself) and MODIFIED
(a DATE typed value representing when the latest revision of a document was submitted/recorded).
Although other RDBMS systems have useful objects and built-in features such as Oracle's SEQUENCE object and ANALYTICAL FUNCTIONS, There are options available with MySQL's SQL based capabilities.
Setting up a Working Schema
Below is the DDL script used to build the environment discussed in this solution. It should match the OP description with an exception (discussed below):
CREATE TABLE document_history
(
id int auto_increment primary key,
document_id int,
name varchar(100),
modified datetime,
user_id int
);
INSERT INTO document_history (document_id, name, modified,
user_id)
VALUES
(81, 'document.docx', convert('2014-03-21 05:00:00',datetime),1),
(82, 'doc.docx', convert('2014-03-21 05:30:00',datetime),1),
(82, 'doc.docx', convert('2014-03-21 05:35:00',datetime),1),
(82, 'doc.docx', convert('2014-03-21 05:50:00',datetime),1);
COMMIT;
The table DOCUMENT_HISTORY
was designed with a DATETIME
typed column for the column called MODIFIED
. Entries into the document_history table would otherwise have a high likeliness of returning multiple records for queries organized around the composite business key combination of: DOCUMENT_ID
and MODIFIED
.
How to Provide a Sequenced Revision ID Assignment
A creative solution to SQL based, partitioned row counts is in an older post: ROW_NUMBER() in MySQL by @bobince.
A SQL query adapted for this task:
select t0.document_id, t0.modified, count(*) as revision_id
from document_history as t0
join document_history as t1
on t0.document_id = t1.document_id
and t0.modified >= t1.modified
group by t0.document_id, t0.modified
order by t0.document_id asc, t0.modified asc;
The resulting output of this query using the supplied test data:
| DOCUMENT_ID | MODIFIED | REVISION_ID |
|-------------|------------------------------|-------------|
| 81 | March, 21 2014 05:00:00+0000 | 1 |
| 82 | March, 21 2014 05:30:00+0000 | 1 |
| 82 | March, 21 2014 05:35:00+0000 | 2 |
| 82 | March, 21 2014 05:50:00+0000 | 3 |
Note that the revision id sequence follows the correct order that each version was checked in and the revision sequence properly resets when it is counting a new series of revisions related to a different document id.
EDIT: A good comment from @ThomasKöhne is to consider keeping this REVISION_ID
as a persistent attribute of your version tracking table. This could be derived from the assigned file name, but it may be preferred because an index optimization to a single-value column is more likely to work. The Revision ID alone may be useful for other purposes such as creating an accurate SORT
column for querying a document's history.
Using MySQL String Manipulation Functions
Revision identification can also benefit from an additional convention: the column name width should be sized to also accommodate for the appended revision id suffix. Some MySQL string operations that will help:
-- Resizing String Values:
SELECT SUBSTR('EXTRALONGFILENAMEXXX',1,17) FROM DUAL
| SUBSTR('EXTRALONGFILENAMEXXX',1,17) |
|-------------------------------------|
| EXTRALONGFILENAME |
-- Substituting and Inserting Text Within Existing String Values:
SELECT REPLACE('THE QUICK <LEAN> FOX','<LEAN>','BROWN') FROM DUAL
| REPLACE('THE QUICK <LEAN> FOX','<LEAN>','BROWN') |
|--------------------------------------------------|
| THE QUICK BROWN FOX |
-- Combining Strings Using Concatenation
SELECT CONCAT(id, '-', document_id, '-', name)
FROM document_history
| CONCAT(ID, '-', DOCUMENT_ID, '-', NAME) |
|-----------------------------------------|
| 1-81-document.docx |
| 2-82-doc.docx |
| 3-82-doc.docx |
| 4-82-doc.docx |
Pulling it All Together: Constructing a New File Name Using Revision Notation
Using the previous query from above as a base, inline view (or sub query), this is a next step in generating the new file name for a given revision log record:
SQL Query With Revised File Name
select replace(docrec.name, '.', CONCAT('_', rev.revision_id, '.')) as new_name,
rev.document_id, rev.modified
from (
select t0.document_id, t0.modified, count(*) as revision_id
from document_history as t0
join document_history as t1
on t0.document_id = t1.document_id
and t0.modified >= t1.modified
group by t0.document_id, t0.modified
order by t0.document_id asc, t0.modified asc
) as rev
join document_history as docrec
on docrec.document_id = rev.document_id
and docrec.modified = rev.modified;
Output With Revised File Name
| NEW_NAME | DOCUMENT_ID | MODIFIED |
|-----------------|-------------|------------------------------|
| document_1.docx | 81 | March, 21 2014 05:00:00+0000 |
| doc_1.docx | 82 | March, 21 2014 05:30:00+0000 |
| doc_2.docx | 82 | March, 21 2014 05:35:00+0000 |
| doc_3.docx | 82 | March, 21 2014 05:50:00+0000 |
These (NEW_NAME
) values are the ones required to update the DOCUMENT_HISTORY
table. An inspection of the MODIFIED
column for DOCUMENT_ID
= 82 shows that the check-in revisions are numbered in the correct order with respect to this part of the composite business key.
Finding Un-processed Document Records
If the file name format is fairly consistent, a SQL LIKE
operator may be enough to identify the record names which have been already altered. MySQL also offers filtering capabilities through REGULAR EXPRESSIONS
, which offers more flexibility with parsing through document name values.
What remains is figuring out how to update just a single record or a set of records. The appropriate place to put the filter criteria would be on the outermost part of the query right after the join between aliased tables:
...
and docrec.modified = rev.modified
WHERE docrec.id = ??? ;
There are other places where you can optimize for faster response times, such as within the internal sub query that derives the revision id value... the more you know about the specific set of records that you are interested in, you can segment the beginning SQL statements to look only at what is of interest.
Homework: Some Closing Comments on the Solution
This stuff is purely optional and they represent some side thoughts that came to mind on aspects of design and usability while writing this up.
Two-Step or One-Step?
With the current design, there are two discrete operations per record: INSERT
by a script and then UPDATE
of the value via a SQL DML call. It may be annoying to have to remember two SQL commands. Consider building a second table built for insert only operations.
Use the second table (DOCUMENT_LIST
) to hold nearly identical information, except possibly two columns:
BASE_FILE_NAME
(i.e., doc.docx or document.docx) which may apply for multiple HISTORY_ID values.
FILE_NAME
(i.e., doc_1.docx, doc_2.docx, etc.) which will be unique for each record.
Set a database TRIGGER
on the source table: DOCUMENT_HISTORY
and put the SQL query we've developed inside of it. This will automatically populate the correct revision file name at roughly the same moment after the script fills the history table.
WHY BOTHER? This suggestion mainly fits under the category of SCALABILITY
of your database design. The assignment of a revision name is still a two step process, but the second step is now handled automatically within the database, whereas you'd have to remember to include it everywhere you invoked a DML operation on top of the history table.
Managing Aliases
I didn't see it anywhere, but I assume that the USER
initially assigns some name to the file being tracked. In the end, it appears that it may not matter as it is an internally tracked thing that the end user of the system would never see.
For your information, this information isn't portrayed to the customer, it is saved in a table in the database as a version history...
Reading the history of a given document would be easier if the "base" name was kept the same once it has been given:

In the data sample above, unless the DOCUMENT_ID
is known, it may not be clear that all the file names listed are related. This may not necessarily be a problem, but it is a good practice from a semantic point of view to separate user assigned file names as ALIASES
that can be changed and assigned at will at any time.
Consider setting up a separate table for tracking the "User-Friendly" name given by the end user, and associating it with the document id it is supposed to represent. A user may make hundreds or thousands of rename requests... while the back end file system uses a simpler, more consistent naming approach.