1

I am stuck on huge problem i will say with my below query. Here j5 represent friday and j6 represent saturday (1 to 7... sunday to monday).

As you know, the buses have different schedules depending on the time of the week. Here, I am taking next 5 trips departure after 25:00:00 on cal (j5) and/or after 01:00:00 on cal2 (j6). Bus schedule are builded like this :

If it's 1 am then the current bus time is 25, 2 am is 26 ... you got it. So if I want departure trip for today after let's say 1 AM, i may get only 2-3 since the "bus" day end soon. To solve this problem, I want to add the next departure from the next day (here is saturday after friday). But next day start at 00 like every day in our world.

So what I want to do is : get all next trips for friday j5 after 25:00:00. If I don't have 5, then get all n trip departure for saturday after 01:00:00 (since 25:00:00 = 01:00:00).

Example : I get departure trip at 25:16:00, 25:46:00 and 26:16:00 for friday. It's 3. I want then to get 2 other departure trip for the next day so i get 5 at the end, and it will be like this 04:50:00 and 05:15:00. So next departure trip from X stop is : 25:16:00(friday), 25:46:00(friday), 26:16:00(friday), 04:50:00(saturday), 05:15:00(saturday).

I am having problem to sort both results from trips.trip_departure.

I know it may be complicated, it's complicated for me to explain but... anyway. Got question I am here. Thanks a lot in advance !

PS: Using MySQL 5.1.49 and PHP 5.3.8 PS2: I want to avoid doing multiple query in PHP so I'd like to do this in one query, no matter what.

        SELECT
            trips.trip_departure,
            trips.trip_arrival,
            trips.trip_total_time,
            trips.trip_direction
        FROM
            trips,
            trips_assoc,
            (
                SELECT calendar_regular.cal_regular_id
                FROM calendar_regular
                WHERE calendar_regular.j5 = 1
            ) as cal,
            (
                SELECT calendar_regular.cal_regular_id
                FROM calendar_regular
                WHERE calendar_regular.j6 = 1
            ) as cal2
        WHERE 
            trips.trip_id = trips_assoc.trip_id
            AND
            trips.route_id IN (109)
            AND
            trips.trip_direction IN (0)
            AND
            trips.trip_period_start <= "2011-11-25"
            AND
            trips.trip_period_end >= "2011-11-25"
            AND
            (
                (
                    cal.cal_regular_id = trips_assoc.calendar_id
                        AND
                    trips.trip_departure >= "25:00:00"
                )
                OR
                (
                    cal2.cal_regular_id = trips_assoc.calendar_id
                        AND
                    trips.trip_departure >= "01:00:00"
                )
            )
        ORDER BY
            trips.trip_departure ASC
        LIMIT
            5

EDIT Table structure :

Table calendar_regular

j1 mean sunday, j7 monday, etc).

  `cal_regular_id` tinyint(3) unsigned NOT NULL AUTO_INCREMENT,
  `j1` tinyint(1) NOT NULL COMMENT 'Lundi',
  `j2` tinyint(1) NOT NULL COMMENT 'Mardi',
  `j3` tinyint(1) NOT NULL COMMENT 'Mercredi',
  `j4` tinyint(1) NOT NULL COMMENT 'Jeudi',
  `j5` tinyint(1) NOT NULL COMMENT 'Vendredi',
  `j6` tinyint(1) NOT NULL COMMENT 'Samedi',
  `j7` tinyint(1) NOT NULL COMMENT 'Dimanche',
  PRIMARY KEY (`cal_regular_id`),
  KEY `j1` (`j1`),
  KEY `j2` (`j2`),
  KEY `j3` (`j3`),
  KEY `j4` (`j4`),
  KEY `j5` (`j5`),
  KEY `j6` (`j6`),
  KEY `j7` (`j7`)   

Data :

    cal_regular_id  j1  j2  j3  j4  j5  j6  j7
    1               0   0   0   0   1   0   0
    2               0   0   0   1   1   0   0
    3               1   1   1   1   1   0   0
    4               0   0   0   0   0   1   0
    5               0   0   0   0   0   0   1

Some bus are avaiable x days it's a table that define when in the week... assigned to the trip_assoc table.

Trips table

  `agency_id` smallint(5) unsigned NOT NULL,
  `trip_id` binary(16) NOT NULL,
  `trip_period_start` date NOT NULL,
  `trip_period_end` date NOT NULL,
  `trip_direction` tinyint(1) unsigned NOT NULL,
  `trip_departure` time NOT NULL,
  `trip_arrival` time NOT NULL,
  `trip_total_time` mediumint(8) NOT NULL,
  `trip_terminus` mediumint(8) NOT NULL,
  `route_id` mediumint(8) NOT NULL,
  `shape_id` binary(16) NOT NULL,
  `block` binary(16) DEFAULT NULL,
  KEY `testing` (`route_id`,`trip_direction`),
  KEY `trip_departure` (`trip_departure`)

trips_assoc table

  `agency_id` tinyint(4) NOT NULL,
  `trip_id` binary(16) NOT NULL,
  `calendar_id` smallint(6) NOT NULL,
  KEY `agency_id` (`agency_id`),
  KEY `trip_id` (`trip_id`,`calendar_id`)
David Bélanger
  • 7,400
  • 4
  • 37
  • 55
  • Some indication of the table layouts would be helpful here... Especially of `calendar_regular` (why are you listing it twice - why not use an `OR` condition?) - if that table contains some sort of `date` column, you should be able to add that to the `ORDER BY` clause. Oh, and it's considered bad practice to use an implicit-join syntax (multiple tables in the `FROM` clause), use explicit joins instead. – Clockwork-Muse Nov 22 '11 at 23:33
  • You are right I edited the post now. I didn't know it was... I tought it was easier to invole join like this. Thanks for the advice ! I use OR to get both day ? I am very lost here seriously ... :/ – David Bélanger Nov 22 '11 at 23:45
  • Do you have any authority to change the database, or are you stuck with this? There _may_ be a better design available, depending on how you are attempting to use this one. What's up with `trip_period_start` and `_end` - if they are the dates for the `departure` and `arrival` times, why aren't you using timestamps (which would sort nicely). An idea of sample data (and uses) for `trips` would be good. You also have a normalization issue - `agency_id` is presented twice, is this deliberate? They also appear to be of two different datatypes, which could be a problem. – Clockwork-Muse Nov 23 '11 at 00:07
  • I have all authority of corse I can create a new design. `trip_period_start` and `trip_period_end` it's the period from when to when the trip are good. Every agency change the date every 2 or 3 months. Those date arent from trips infos but in general like . About `agency_id`, yes it is. Even if it shouldn't, it may be happen that the hash from a trip_id be similar to another, there for I must give them both an agency_id. Also, I sometime request data from trips table without going into trips_assoc, in general it is to avoid join. from october 1 to december 31 – David Bélanger Nov 23 '11 at 03:14

2 Answers2

2

First off, NEVER let an outside entity dictate a non-unique join column. They can possibly (with authorization/authentication) dictate unique ones (like a deterministic GUID value). Otherwise, they get to dictate a natural key somewhere, and your database automatically assigns row ids for joining. Also, unless you're dealing with a huge number of joins (multiple dozens) over un-indexed rows, the performance is going to be far less of a factor than the headaches of dealing with it elsewhere.

So, from the look of things, you are storing bus schedules from multiple companies (something like google must be doing for getting public transit directions, yes).
Here's how I would deal with this:

  • You're going to need a calendar file. This is useful for all business scenarios, but will be extremely useful here (note: don't put any route-related information in it).

  • Modify your agency table to control join keys. Agencies do not get to specify their ids, only their names (or some similar identifier). Something like the following should suffice:

    agency
    =============
    id      - identity, incrementing
    name    - Externally specified name, unique
    
  • Modify your route table to control join keys. Agencies should only be able to specify their (potentially non-unique) natural keys, so we need a surrogate key for joins:

    route
    ==============
    id                - identity, incrementing
    agency_id         - fk reference to agency.id
    route_identifier  - natural key specified by agency, potentially non-unique.
                      - required unique per agency_id, however (or include variation for unique)
    route_variation   - some agencies use the same routes for both directions, but they're still different.
    route_status_id   - fk reference to route_status.id (potential attribute, debatable)
    

    Please note that the route table shouldn't actually list the stops that are on the route - it's sole purpose is to control which agency has which routes.

  • Create a location or address table. This will benefit you mostly in the fact that most transit companies tend to put multiple routes through the same locations:

    location
    =============
    id        - identity, incrementing
    address   - there are multiple ways to represent addresses in a database.  
              - if nothing else, seperating the fields should suffice
    lat/long  - please store these properly, not as a single column.
              - two floats/doubles will suffice, although there are some dedicated solutions.
    
  • At this point, you have two options for dealing with stops on a route:

    1. Define a stop table, and list out all stops. Something like this:

      stop
      ================
      id           - identity, incrementing
      route_id     - fk reference to route.id
      location_id  - fk reference to location.id
      departure    - Timestamp (date and time) when the route leaves the stop.  
      

      This of course gets large very quickly, but makes dealing with holiday schedules easy.

    2. Define a schedule table set, and an schedule_override table set:

      schedule
      ===================
      id           - identity, incrementing
      route_id     - fk reference to route.id
      start_date   - date schedule goes into effect.
      
      schedule_stop
      ===================
      schedule_id  - fk reference to schedule.id
      location_id  - fk reference to location.id
      departure    - Time (time only) when the route leaves the stop 
      dayOfWeek    - equivalent to whatever is in calendar.nameOfDay
                   - This does not have to be an id, so long as they match
      
      schedule_override
      ===================
      id             - identity, incrementing
      route_id       - fk reference to route.id
      effective_date  - date override is in effect.  Should be listed in the calendar file.
      reason_id      - why there's an override in effect.
      
      schedule_override_stop
      ===========================
      schedule_override_id  - fk reference to schedule_override.id
      location_id           - fk reference to location.id
      departure             - time (time only) when the route leaves the stop
      

With this information, I can now get the information I need:

SELECT
FROM agency as a
JOIN route as b
ON b.agency_id = a.id
AND b.route_identifier = :(whatever 109 equates to)
AND b.route_variation = :(whatever 0 equates to)
JOIN (SELECT COALESCE(d.route_id, j.route_id) as route_id, 
             COALESCE(e.location_id, j.location_id) as location_id,
             COALESCE(TIMESTAMP(c.date, e.departure), 
                      TIMESTAMP(c.date, j.departure)) as departure_timestamp
      FROM calendar as c
      LEFT JOIN (schedule_override as d
                 JOIN schedule_override_stop as e
                 ON e.schedule_override_id = d.id)
      ON d.effective_date = c.date
      LEFT JOIN (SELECT f.route_id, f.start_date
                        g.dayOfWeek, g.departure, g.location_id,
                        (SELECT MIN(h.start_date)
                         FROM schedule as h
                         WHERE h.route_id = f.route_id
                         AND h.start_date > f.start_date) as end_date
                 FROM schedule as f
                 JOIN schedule_stop as g
                 ON g.schedule_id = f.id) as j
      ON j.start_date <= c.date
      AND j.end_date > c.date
      AND j.dayOfWeek = c.dayOfWeek
      WHERE c.date >= :startDate
      AND c.date < :endDate) as k
ON k.route_id = b.id
AND k.departure_timestamp >= :leaveAfter
JOIN location as m
ON m.id = k.location_id
AND m.(location inforation) = :(input location information)
ORDER BY k.departure_timestamp ASC
LIMIT 5

This will give a list of all departures leaving from the specified location, for the given route, between startDate and endDate (exclusive), and after the leaveAfter timestamp. Statement (equivalent) runs on DB2. It picks up changes to schedules, overrides for holidays, etc.

Community
  • 1
  • 1
Clockwork-Muse
  • 12,806
  • 6
  • 31
  • 45
0

I think X-Zero advice is the best solution, but I had free time:) Please see below, I have used concat to handle as timestamp and after ordered by those two column. I wrote freehand can be error, I have used exists, somewhere I read its more faster than join but you can just use concat and order parts of the query

 SELECT
        trips.trip_departure,
        trips.trip_arrival,
        trips.trip_total_time,
        trips.trip_direction,
        CONCAT(trips.trip_period_start,' ',trips.trip_departure) as start,
        CONCAT(trips.trip_period_end,' ',trips.trip_departure) as end,
    FROM trips
    WHERE EXISTS
          (
            SELECT
              trips_assoc.calendar_id
            FROM
              trips_assoc
            WHERE
              trips.trip_id = trips_assoc.trip_id
            AND EXISTS
                (
                   SELECT
                      calendar_regular.cal_regular_id
                   FROM
                      calendar_regular
                   WHERE
                      cal2.cal_regular_id = trips_assoc.calendar_id
                   AND
                     (
                        calendar_regular.j5 = 1
                        OR
                        calendar_regular.j6 = 1
                     )      
                )
           )
    AND
           trips.route_id IN (109)
    AND
           trips.trip_direction IN (0)
    AND
           trips.trip_period_start <= "2011-11-25"
    AND
           trips.trip_period_end >= "2011-11-25"
    AND
          (
             trips.trip_departure >= "25:00:00"
             OR
             trips.trip_departure >= "01:00:00"
          )
    ORDER BY
          TIMESTAMP(start) ASC,TIMESTAMP(end) ASC
    LIMIT
          5

EDIT: COPY/PASTE issue corrected

HRgiger
  • 2,750
  • 26
  • 37
  • This will not work since it's not the timestamp of the trip but the timestamp of all trips. It's like the avaiable period like october 1 to december 31... etc. thanks, good try indeeed – David Bélanger Nov 23 '11 at 03:20