8

I have a table called "mytable". The columns are

Time_Stamp (datetime) PK
Time_stamp_ms (int) PK
data1 (int)
data2 (int)
data3 (int)
data4 (int) 
data5 (int)
data6 (int)
cycle (int)
name (varstring)

I want to order by Time_Stamp and Time_stamp_ms (I know how to do this from another question) and then each time cycle reaches 1, I want to get the Time_Stamp and Time_Stamp_ms from the previous row. Cycle is 1,2,3,4......n Means it will always increment by 1.

This table will problably have millions and millions of rows.

Also no PHP.

There is a sample of my table:

Time_Stamp              Time_Stamp_ms   d1      d2      d3      d4      d5      d6      cycle   name    

2014-04-24 09:09:37         765         5555    4444    3333    2222    1111    123     1       name
2014-04-24 09:09:37         845         5555    4444    3333    2222    1111    123     2       name
2014-04-24 09:09:37         925         5555    4444    3333    2222    1111    123     3       name
2014-04-24 09:09:38         5           5555    4444    3333    2222    1111    123     4       name
2014-04-24 09:09:38         85          5555    4444    3333    2222    1111    123     5       name
2014-04-24 09:09:38         165         5555    4444    3333    2222    1111    123     6       name
2014-04-24 09:09:38         245         5555    4444    3333    2222    1111    123     7       name
2014-04-24 09:09:38         325         5555    4444    3333    2222    1111    123     8       name
2014-04-24 09:09:38         405         5555    4444    3333    2222    1111    123     9       name
2014-04-24 09:09:38         485         5555    4444    3333    2222    1111    123     10      name
2014-04-24 09:09:38         565         5555    4444    3333    2222    1111    123     11      name
2014-04-24 09:09:38         645         5555    4444    3333    2222    1111    123     12      name
2014-04-24 09:09:38         725         5555    4444    3333    2222    1111    123     13      name
2014-04-24 09:09:38         805         5555    4444    3333    2222    1111    123     1       name
2014-04-24 09:09:38         885         5555    4444    3333    2222    1111    123     2       name
2014-04-24 09:09:38         965         5555    4444    3333    2222    1111    123     3       name
2014-04-24 09:09:39         45          5555    4444    3333    2222    1111    123     4       name
2014-04-24 09:09:39         125         5555    4444    3333    2222    1111    123     5       name
2014-04-24 09:09:39         205         5555    4444    3333    2222    1111    123     6       name
2014-04-24 09:09:39         285         5555    4444    3333    2222    1111    123     1       name
2014-04-24 09:09:39         365         5555    4444    3333    2222    1111    123     2       name
2014-04-24 09:09:39         445         5555    4444    3333    2222    1111    123     3       name
2014-04-24 09:09:39         525         5555    4444    3333    2222    1111    123     4       name
2014-04-24 09:09:39         605         5555    4444    3333    2222    1111    123     5       name
2014-04-24 09:09:39         685         5555    4444    3333    2222    1111    123     6       name
2014-04-24 09:09:39         765         5555    4444    3333    2222    1111    123     1       name
2014-04-24 09:09:39         845         5555    4444    3333    2222    1111    123     2       name
2014-04-24 09:09:39         925         5555    4444    3333    2222    1111    123     3       name

Should return me:

    Time_Stamp              Time_Stamp_ms   d1      d2      d3      d4      d5      d6      cycle   name    


2014-04-24 09:09:38         725         5555    4444    3333    2222    1111    123     13      name
2014-04-24 09:09:39         205         5555    4444    3333    2222    1111    123     6       name
2014-04-24 09:09:39         685         5555    4444    3333    2222    1111    123     6       name
riahc3
  • 857
  • 4
  • 10
  • 23

5 Answers5

4

As said in the comments, you really do need a field that indicates the order of the rows. A pkey int primary key auto_increment field does not guarantee that the newest row always has the largest id, so strictly speaking that does not work 100% of the time. A column containing exact insert time would do.

Assuming though (falsely i know), that your value field is one that can be sorted with, this query will get you every row that comes before a id=1. To get a proper result, create a field that is in order, and replace value with that field in the two order by clauses

updated query: http://sqlfiddle.com/#!2/9cf7d1/1/0

SELECT Time_Stamp, Time_stamp_ms, cycle FROM (
    SELECT 
        COALESCE((@preVal=1), 0) AS afterOne, 
        m.*, 
        @preVal:=m.cycle  
    FROM mytable as m, 
    (SELECT @preVal:=NULL) AS d 
    ORDER BY Time_Stamp desc, Time_stamp_ms desc
) t 
WHERE afterOne = 1 
ORDER BY Time_Stamp, Time_stamp_ms;

One additional note. If you are dealing with a big dataset, you may find drastically improved performance by inserting the inner query in to a temporary table, indexing afterOne, and then selecting the final result. MySQL is notorious for being slow with subquerys.

PS. hmm, i see now that i may have chosen poorly, afterOne really means before when ordered ascending. Oh well, its a placeholder anyways, can be named anything that makes sense.

Noino
  • 593
  • 4
  • 13
  • your query giving wrong result. ypu might need to improve. please check here http://sqlfiddle.com/#!2/ffa4d5/1 – Vignesh Kumar A Apr 30 '14 at 07:49
  • I know, i said that it does, because the value field is not one that can be sorted with, and included instructions on how to fix the result - create an additional column (insert_time) that can reliably be sorted with – Noino Apr 30 '14 at 07:53
  • My real table has indeed a timestamp column and a timestamp ms column as well. I posted this strictly for example purposes. – riahc3 Apr 30 '14 at 08:06
  • Great, then implementing should be pretty straight forward right? – Noino Apr 30 '14 at 08:08
  • Query is indeed taking a while. This test DB Im doing has 5000 rows but the real one with have millions and millions. – riahc3 Apr 30 '14 at 09:06
  • Its giving me a "Error Code: 3. Error writing file '/tmp/MYVHUL2t' (Errcode: 28) 1.809 sec" – riahc3 Apr 30 '14 at 09:13
  • created this db on my local computer, filled it up with 10k rows, got result of 139 rows in 0.01sec In my test, every cycle runs one further then the last, will try again with fixed length cycles – Noino Apr 30 '14 at 09:24
  • 10k rows, ~2.5k cycles (every cycle going to 4), i get 2499 rows returned in 0.01 seconds – Noino Apr 30 '14 at 09:30
  • error your getting seems to mean that you are out of hdd space http://stackoverflow.com/questions/7415710/mysql-writing-file-error-errcode-28 – Noino Apr 30 '14 at 09:37
  • locally this query gives me a 24999 row result of a 100k dataset in 0.15sec – Noino Apr 30 '14 at 09:40
  • That seems to be impossible because it says there is 216GB free. Im gonna erase everything in /tmp – riahc3 Apr 30 '14 at 09:42
  • /tmp only has 36KB of data – riahc3 Apr 30 '14 at 09:51
  • Well, cant really speak to that error. I'm not that proficient in linux management. May even be a configuration issue, don't know. All i can really tell you is that the query performs as expected. – Noino Apr 30 '14 at 10:00
  • I added some more data and some more examples. – riahc3 Apr 30 '14 at 10:07
  • new data does not change anything, the result your expecting is what my query returns. http://sqlfiddle.com/#!2/36e70/1 – Noino Apr 30 '14 at 10:22
3

As mcalex said

You have no primary key. 2. The row order of data SHOULD NOT matter. If you want those rows ordered, you need a field to help with this. THEN, you can ask for the row before a certain row

Try this

SELECT * from
(
  Select @prev As previous,@pid as `Previous id`,@pid := e.id As `id` ,@prev := e.value As current
  From
  (
    Select @prev := null,@pid := 0
  ) As i,tbl As e
) x
Where id=1 And Previous is not null;

Fiddle Demo


Output

+---------------------------------------------------+
|   PREVIOUS |  PREVIOUS_ID |  Current_ID | CURRENT |
+---------------------------------------------------+
|   C        |       3      |        1    |  D      |
|   F        |       3      |        1    |  G      |
|   X        |       4      |        1    |  J      |
+---------------------------------------------------+
Community
  • 1
  • 1
Vignesh Kumar A
  • 27,863
  • 13
  • 63
  • 115
  • For this table, both ID and Value are PKs. I made this for example purposes and to simply everything. In my acutal table, I have a timestamp column and a timestamp ms column as my PKs. – riahc3 Apr 30 '14 at 08:07
1

My first choice would probably be to use one of the above suggestions generating a sequence number. However with a large number of records building up a sequence like that can be slow (especially if you then ignore more of the records).

However another option is to do a join. This is messier as you have 2 columns to determine which is the previous record.

Not tested but something like this:-

SELECT a.*, b.Time_Stamp, b.Time_stamp_ms
FROM
(
    SELECT a.Time_Stamp, a.Time_stamp_ms, a.cycle, MAX(DATE_ADD(b.Time_Stamp, INTERVAL b.Time_stamp_ms MICROSECONDS)) AS latest_prev_record
    FROM mytable a
    INNER JOIN mytable b
    ON DATE_ADD(a.Time_Stamp, INTERVAL a.Time_stamp_ms MICROSECONDS) > DATE_ADD(b.Time_Stamp, INTERVAL b.Time_stamp_ms MICROSECONDS)
    WHERE a.cycle = 1
    GROUP BY a.Time_Stamp, a.Time_stamp_ms, a.cycle
) Sub1
INNER JOIN mytable a
ON a.Time_Stamp = Sub1.Time_Stamp, 
AND a.Time_stamp_ms = Sub1.Time_stamp_ms, 
AND a.cycle = Sub1.cycle
INNER JOIN mytable b
ON DATE_ADD(b.Time_Stamp, INTERVAL b.Time_stamp_ms MICROSECONDS) = Sub1.latest_prev_record

This could be made substantially simpler if you only want the timestamps and no other data and if you had one combined date / time / millisecond field (you could just use the subquery then). Even easier if you just had all the records having a sequential id field (ie, garunteed to be in that order).

EDIT - Simplified if you only want the last record back before cycle 1:-

SELECT z.*
FROM
(
    SELECT a.Time_Stamp, a.Time_stamp_ms, MAX(DATE_ADD(b.Time_Stamp, INTERVAL b.Time_stamp_ms MICROSECOND)) AS latest_prev_record
    FROM mytable a
    INNER JOIN mytable b
    ON DATE_ADD(a.Time_Stamp, INTERVAL a.Time_stamp_ms MICROSECOND) > DATE_ADD(b.Time_Stamp, INTERVAL b.Time_stamp_ms MICROSECOND)
    WHERE a.cycle = 1
    GROUP BY a.Time_Stamp, a.Time_stamp_ms
) Sub1
INNER JOIN mytable z
ON DATE_ADD(z.Time_Stamp, INTERVAL z.Time_stamp_ms MICROSECOND) = Sub1.latest_prev_record

EDIT again.

You could add a decimal field for the combined timestamp (add an index for it) and populate it with:-

update `mytable` set `timestamp_full` =  UNIX_TIMESTAMP(`Time_Stamp`) + (`Time_stamp_ms` / 1000)

then you could use the following SQL to get the records you want:-

SELECT z.*
FROM
(
    SELECT a.timestamp_full, MAX(b.timestamp_full) AS latest_prev_record
    FROM mytable a
    INNER JOIN mytable b
    ON a.timestamp_full > b.timestamp_full
    WHERE a.cycle = 1
    GROUP BY a.timestamp_full
) Sub1
INNER JOIN mytable z
ON z.timestamp_full = Sub1.latest_prev_record
Kickstart
  • 21,403
  • 2
  • 21
  • 33
  • Due to the method that a program uses to read date/time, date/time must be in oe column and miliseconds must be in another. – riahc3 Apr 30 '14 at 09:38
  • I added some more data and some more examples. – riahc3 Apr 30 '14 at 10:08
  • Amended my answer to match your requirement. Simplified it a bit. However given your data using a sequence as others have suggested might be quicker. Depends if you can modify the table at all (if you could add an extra column with an index that contains the date / time / ms combined, possibly updated via a trigger, then this would greatly improve this query), and the details of how many cycles there are for each cycle of 1. – Kickstart Apr 30 '14 at 10:33
  • I could add a "Time_Stamp_Combined" column but this would need to be a string because the version of MySQL Im using is 5.5 (or less) – riahc3 Apr 30 '14 at 10:35
  • Might well help. You could possibly store it as a decimal containing the unix timestamp of the date / time plus the milliseconds as the part after the decimal point. – Kickstart Apr 30 '14 at 10:40
  • I think a Unix timestamp would way overcomplicate things. I would have to get the difference, convert that over to a normal time stamp again, send it back....way too difficult to do. – riahc3 Apr 30 '14 at 11:36
  • If you just add it as an extra column (easily populated as above - and you can do much the same in a trigger for new rows) there should be no need to convert it back at all. It is merely used to get the records in order, and as a decimal field it is fairly small. – Kickstart Apr 30 '14 at 11:39
  • I already have the records in order as each row that is inserted is ordered by the timestamp column. Adding a unixnumber is way complicating the problem. Noino seems to be giving me a problem (MySQL Server related) so I cant confirm as answer... – riahc3 Apr 30 '14 at 13:12
  • The problem is that the data is NOT ordered by the timestamp column alone, rather by timestamp AND the milliseconds added to the timestamp (ie, there are 3 records for 2014-04-24 09:09:37 ). Having 2 separate columns is making it more difficult to join up, hence the complication of my 2nd suggestion compared to the third suggestion (which doesn't require any functions performing on the 2 fields, hence can use an index on the combined timestamp instead). – Kickstart Apr 30 '14 at 13:37
  • BTW, would this make the query faster? I think with millions of records, Im going to have a huge time delay. When you mention that it should be a index, do you mean that besides the PKs Im using right now of "Time_Stamp" and "Time_Stamp_ms" I should also use a PK of "Time_Stamp_full" thuse I would have 3 PKs? I just want to make sure in case I implement this as it seems the only and fastest way. – riahc3 Apr 30 '14 at 15:13
  • Which is faster will depend on the data. Neither are going to be fast. I knocked up some test data (~7.5m records) and tried the solution suggested by Noino but it still hasn't responded an hour later. My suggestion using an indexed full timestamped field may well be quicker if there are a large number of cycles for each cycle 1, but if a small number of cycles for each cycle 1 then I doubt it will be an improvement. Both are having to build up a large temp table, but the solution by Noino will produce a larger temp table while mine will produce a smaller one but require more processin on it. – Kickstart Apr 30 '14 at 15:45
  • I would have to go with Noino's solution because where the MySQL server will be hosted will have more space than processing power. – riahc3 Apr 30 '14 at 22:23
  • What are the realistic quantities? How many rows in total? Do you want to process all the rows, or just a subset? How many cycles are there for each cycle 1 on average? – Kickstart May 01 '14 at 08:07
  • Realistically we are talking a estimated 2 million rows per day. I want to process all the rows I select during one timestamp and another. I havent tested that yet but I do know how to filter from one timestamp to another. I cannot say how many are there on average, this number is unknown. – riahc3 May 01 '14 at 14:24
  • That is a fairly large number (although not as bad as I feared). Is there any way you could add a cycle id? So for each range of cycles from 1 to X they all shared an id, with when the cycle changed back to 1 it moved to the next id? – Kickstart May 01 '14 at 14:27
  • So you mean something like cycle: 1.......990 cycle id: 1 cycle: 1......902334 cycle id: 2 cycle: 1.......2 cycle id: 3 cycle: 1.....29383248 cycle id: 4 ? Is that what you mean? Also, Im not sure what the "automatically move this discussion to chat" means because this comments section is getting huge and way out of hand – riahc3 May 02 '14 at 10:05
  • Yes, that is what I meant. If you can generate such a cycle id then you can merely get the row for the MAX(cycle) grouped by the cycle id. The comments are getting long but still closely related to the issue so I wouldn't put this to chat (at least for now) – Kickstart May 02 '14 at 10:56
  • Well then, every time I see a "1" I would increment that "cycle_id" right? Is that difficult? – riahc3 May 02 '14 at 11:53
  • You could do it with similar code to that above (both mine and that of @Noino ) for existing data which might take a while. However if you could do it on the inserts (possibly on a trigger on the table), and queries based on it could be faster. – Kickstart May 02 '14 at 14:30
  • How would I do it on each insert? Each time cycle is "1" I would increment "cycle_id" counter once. How do I do that? And also, how would I do it for all 5 million current records? – riahc3 May 05 '14 at 07:39
  • Was finally able to do it and did that last query.....Get a lost connection during the query... – riahc3 May 05 '14 at 12:39
  • I increased the timeout time and its been at least a hour and its still running the query... – riahc3 May 05 '14 at 15:20
  • Meanwhile, how can I go thru the entire table and add a cycle_id for each new 1 I see in cycle? – riahc3 May 05 '14 at 16:10
0

If it's just a small table you're working with (under 10.000 rows), I think the best solution is to fetch the whole thing and choose the rows "manually" (= in a PHP loop). It will surely be a lot faster than any SQL based solution, considering you only fetch IDs and a primary key to select resulting rows.

Speaking of a strictly SQL-based solution, you need a stored procedure and a cursor to step along the result set (this allows you to step one back) - but it's not very efficient since you need to query the whole table and do the matching one by one. Index-based queries can NOT do this so any SQL solution you get will go down the whole table (do a "full scan") and therefore won't be fast.

AND YES, the previous answers are right in that the order of rows just should not matter. They're kind of "random", or at least you should look at them as if they were. (Even if you do an ALTER TABLE ... ORDER BY, you can't be sure after the next operation that modifies a single row.)

dkellner
  • 8,726
  • 2
  • 49
  • 47
  • This table will be huge. Millions of rows. – riahc3 Apr 30 '14 at 08:18
  • Okay then. Tell me WHY exactly you need this previous-row thing so I can help. – dkellner Apr 30 '14 at 08:23
  • I need it because I need to know when a complete cycle has finished and obviously its when a new cycle starts (1). This is for a industrial type situation. – riahc3 Apr 30 '14 at 08:39
  • Explain a little more please. What is a "cycle" in this case? Are you looping around the table? Does it matter what order your rows come in? How do you ensure the proper order of them? Do you have a key that you ORDER BY? – dkellner Apr 30 '14 at 08:47
  • Okay I see your update now. In this case, you're looking for the highest cycle value in each pass where one pass seems to be the subsequent rows close to each other by timestamp. Is it correct? – dkellner Apr 30 '14 at 08:52
  • The smartest way is that you create a new field and give it a calculated value for each row; you need to determine a "pass number" for each cycle sequence - a number that is the same inside a cycle and unique for each cycle. Then you need another field that marks the highest cycle number in each pass. It's 2 extra fields but if you don't do it like this you'll run into a classic GROUP BY problem where you look for the maximum of one field and you group them by another. So, the question is - how often does this table get updated? – dkellner Apr 30 '14 at 08:57
  • Each cycle can take a value from 1 to n (I dont know n) but once a new cycle starts, I know it starts with 1 so I would simply look for the next 1 and the row before that tell ms the n and when the cycle ended. – riahc3 Apr 30 '14 at 09:04
  • It's not a problem then. Whenever it happens, mark the highest cycle number with another field. Since you gave us little of the details I don't know exactly how the updates will happen but if you add the mentioned 2 fields (e.g. "cycle_max" and "pass_number") you can safely select every row with "select .... where cycle_max=1 group by pass_number" and you can also index on them so it will be lightning fast. All you need to do is somehow clear the cycle_max flag for any previous steps in the same pass and set it for the latest one. Wonder if I explained this clear... – dkellner Apr 30 '14 at 09:07
  • Your approach won't work - there is no such thing as "previous" in this case. You could select them one by one, ordering by timestamps backwards, but you'd have to do it for every sequence. Slow like hell. – dkellner Apr 30 '14 at 09:08
  • But cycle restarts to one after each pass. So I can have a row starting at 1 then a bunch of rows saying 2,3,4,etc then again 1. After than 1, another going up to 500. Then next again starts at 1 and ends at 2. Knowing the max cycle number means pretty much nothing. – riahc3 Apr 30 '14 at 09:09
  • I added some more data and some more examples. – riahc3 Apr 30 '14 at 10:07
0

Posted as another answer as it is getting complicated, and this is more of a talk through an extra option.

It would be easier to do a join if there was an indexed column to check for the latest record, or check the latest group of cycles.

If you add a column for the cycle number then you can initially populate it with:-

SET @cycle_no = 0;
UPDATE mytable
SET cycle_no=@cycle_no:=@cycle_no + 1
WHERE cycle = 1
ORDER BY time_stamp, time_stamp_ms;

then

UPDATE mytable a
SET a.cycle_no = (SELECT MAX(b.cycle_no) FROM mytable b WHERE a.time_stamp < b.time_stamp OR (a.time_stamp a.time_stamp < b.time_stamp  b.time_stamp AND a.time_stamp_ms < b.time_stamp_ms ))
WHERE a.cycle != 1

The first of these populates the cycle_no for eacg cycle of 1 and the second populates all the other rows cycle_no values

You can keep it populated with the following trigger (there may be a more efficient way of doing this).

CREATE TRIGGER insert_mytable
BEFORE INSERT ON mytable
FOR EACH row
SET NEW.cycle_no = IF(NEW.cycle = 1, (SELECT MAX(cycle_no) + 1 FROM mytable WHERE cycle = 1 ), (SELECT MAX(cycle_no) FROM mytable WHERE cycle = 1 ));

You can then get the latest values like this (which relies on the cycle_no only incrementing by 1):-

SELECT z.*
FROM
(
    SELECT b.cycle_no, MAX(b.cycle)
    FROM mytable a
    INNER JOIN mytable b
    ON b.cycle_no = (a.cycle_no - 1)
    WHERE a.cycle = 1
    GROUP BY b.cycle_no
) Sub1
INNER JOIN mytable z
ON z.cycle_no = Sub1.cycle_no

On the test data I knocked up (~7.5 million records) this took ~53 seconds. Not sure if that would be useable for you.

Kickstart
  • 21,403
  • 2
  • 21
  • 33
  • I have cycle that goes from 1.....n (right now n=250 although that's gonna change) and now cycle_id which is populated with 1,1,1,1,1,1;2,2,2,2,2;3,3,3;4,4,4;etc. All of this has a total of 5813752 which is all the total records. I just called that query you did and its taking again a LONG time to come back with anything... – riahc3 May 06 '14 at 15:28
  • Given the amount of data I can't think of a way of doing it quicker. Sorry. – Kickstart May 06 '14 at 16:18
  • I think filtering with a where clause with the date and time makes it speed up.... Ill post it as soon as I organize and explain the answer correctly. – riahc3 May 07 '14 at 08:24
  • Filtering will increase the speed, but only by reducing the numbers you are processing. If you want a days worth of records (~2m from what you have previously said) it will still be slow. – Kickstart May 07 '14 at 08:30