I currently have two tables in MySQL InnoDB database. Here are two simplified versions of the tables.
Table 1 (PropData):
+-----------------------------------+
| PropData |
+-----------+-----------------------+
| BigTag | Date |
+-----------+-----------------------+
| 10001AB | 1000-01-01 00:00:00 |
+-----------+-----------------------+
Table 2 (LimitTags):
+-------------------+
| LimitTags |
+---------+---------+
| Tag | Model |
+---------+---------+
| 10001 | Base |
+---------+---------+
Usage:
The PropData table contains a bunch of data that I need to save and plot, and each record in that table has a certain tag/model from the LimitTags table that applies to it (i.e. many PropData records can refer to one LimitTag record).
Goal:
I'm currently trying to create a query will retrieve a list of unique tag and model pairs by searching through the PropData table in some specified timespan and figure out which tag/model pairs (from the LimitTags table) exist.
Ideally I think this would be pretty doable by JOINing the Tag fields from both tables, including a WHERE clause to get the data within my timespan and then SELECTing the distinct tags and models.
The snag is that the Tag in the PropData table has two more chars - than the equivalent tag in the LimitTags table - appended to it (hence the name BigTag), which means I can't just JOIN the tables together.
What I've Tried:
SELECT DISTINCT S.Tag, S.Model
FROM (
SELECT T.Tag,
T.Model,
P.Date
FROM (
SELECT LEFT(BigTag, 5) AS Tag, Date
FROM PropData
GROUP BY LEFT(BigTag, 5)
) AS P
JOIN LimitTags T ON T.Tag=P.Tag
) AS S
WHERE S.Date BETWEEN '1000-01-01 00:00:00' AND '9999-12-31 23:59:59'
ORDER BY S.Date DESC;
The Issue:
The problem with the above query is that the first subquery that forms P is incredibly slow and searches the entire table, which is millions of records.
Running the explain command confirms this by telling me that the query is using filesort and that the type is ALL.
I'm basically stuck trying to figure out how to select all the records from PropData, where I only take the first 5 characters of the BigTag column (to easily join with LimitTags), and the Date (for only getting data between my timespan), and where I am only getting records in which the first 5 characters of the Tag are distinct (hence the group by).
I think the issue stems from how I'm using the Left() function in my select and group by statements (shown below), but I haven't figured out how to get around it.
SELECT LEFT(BigTag, 5) AS Tag, Date
FROM PropData
GROUP BY LEFT(BigTag, 5)
I also have an index on (BigTag) and (BigTag(5), Date) for PropData, but neither of those are used in the query for P. There is also an index on Tag for LimitTags and that is used during the join operation.
I'm pretty new to MySQL and writing queries in general and could definitely use some advice on how to accomplish this. This is also my first post on SO and I'm hoping I haven't screwed anything up!
Thanks in advance!
Edit 1:
From Strawberry's comment, I had the idea to create a new table that contains only two columns which are the id (primary key, not shown in the table structure above) and the shortened tag column with only the first 5 characters of the equivalent record's BigTag column. Then I can add a trigger to add or delete records in this table to match the original PropData table. I'll post back if that works well and there are no other solutions that work for just the existing tables.
Edit 2:
So I tried out what I said in the paragraph above, and I'm getting some really weird results. After creating a new table called PropDataTag with the columns being the Id and the first 5 chars of the BigTag from PropData. I tried performing this query:
SELECT DISTINCT T.Tag, T.Model
FROM PropData P
JOIN PropDataTag N ON P.Id=N.Id
JOIN LimitTags T ON N.Tag=T.Tag
WHERE P.Date BETWEEN '0000-01-01 00:00:00' AND '9999-12-31 23:59:59'
The first time I ran it, it went perfectly and I got my results in 0.016 sec. I also ran the explain command and got the following results. When I took the screenshot, I unfortunately neglected to expand the ref column so I don't know what two of the values are and am having trouble reproducing these results.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE T index IdxTag,IdxTagAndModel IdxTagAndModel 49 NULL 1427 Using index; Using temporary
1 SIMPLE N ref IdxTag IdxTag 7 NoIdea 1238 Using index; Distinct
1 SIMPLE P eq_ref PRIMARY,IdxDate,IdxIdAndDate IdxDate 38 NoIdea 1 Using where; Distinct
Thinking everything was well in hand, I then tried the exact same query on the other 3 PropData tables (mentioned in comments) and UNIONed them together. After running the query, it took more than 2 minutes before I stopped it for taking too long. I tried running the explain command on the original query shown above, and instead of getting the same results, I got the following:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE P range PRIMARY,IdxDate,IdxIdAndDate IdxDate 24 NULL 1785585 Using where; Using Index; Using temporary
1 SIMPLE N eq_ref PRIMARY,IdxTag PRIMARY 38 P.Id 1
1 SIMPLE T ref IdxTag,IdxTagAndModel IdxTag 7 N.Tag 1
What just happened? I can't figure out:
Why did the order of the rows in the explain output change?
Why did MySQL decide to use different indices all of a sudden?
Does anyone have any ideas or input?