0

I have the entity Event which will have 1 or more EventDay(s), that will represent a contignous time period in a given day. e.g. Event A has 2 EventDays:

  • EventDay 1 - date: 10.10.2000; startingAt: 10:00, endingAt: 17:30
  • EventDay 2 - date: 11.10.2000; startingAt: 09:30, enidngAt: 12:00

I have the User, who is able to set their own Schedule, which should be 1 or more contignous time periods for any given day of the week. This represents the time that they are free to attend events. E.g. User B has the following schedule:

  • Monday: 08:00 - 15:00, 17:00 - 19:00
  • Tuesday: 10:00 - 20:00
  • Wednesday: 12:00 - 13:30, 15:00 - 17:00, 20:00 - 22:30
  • ... same for every day of the week

I'm trying to get a list of all the Events that a User can take part in, ranked by how many EventDays match the user's schedule, alongside some stats for all the event's days. (For example the total number of hours over all the days, not just the ones that match the user's schedule.)

A nice addition would be to be able to filter the events by one or more week days. (For example I'm looking for events that only have event days on Monday and Tuesday.)

The time periods must be a multiple of 30 minutes so in the database I'll store hour 00:00 as the integer 0, 00:30 as the integer 1, and so on until 24:00 as the int 48.

Event

id name
1 event A

EventDay

id event_id date startingAt endingAt
1 1 10.10.2000 20 35
2 1 11.10.2000 19 24

Schedule

id user_id day startingAt endingAt
1 1 0 16 30
2 1 0 34 38
3 1 1 20 40
4 1 2 24 27
5 1 2 30 34
6 1 2 40 45

Here the column day is in the range 0..6 and represent the day of week.

I came up with:

SELECT
    event.id,
    SUM(unfiltered_event_day.ending_at - unfiltered_event_day.starting_at) as total_event_hours,
FROM events event
         INNER JOIN event_days event_day ON event.id = event_day.event_id
         INNER JOIN event_days unfiltered_event_day ON event.id = unfiltered_event_day.event_id
         INNER JOIN schedules schedule ON EXTRACT(ISODOW FROM event_day.date) - 1 = schedule.day
WHERE
        schedule.user_id = :userId AND 
        EXTRACT(ISODOW FROM event_day.date) - 1 in :listOfdays AND
        event_day.starting_at >= schedule.starting_at AND
        event_day.ending_at <= schedule.ending_at
GROUP BY event.id
ORDER BY count(event.id) DESC, MIN(event_day.date)

EXTRACT(ISODOW FROM event_day.date) - 1 gets the day of week from a date.

Is this approach any good in terms of performance?

I've tried some other ways to model this but this one seems the most performant.

One alternative approach was to have a read-only table with all the possible combinations of time periods in a given day (00:00-00:30, 00:00-01:00, ..., 23:30-24:00), and a m:n relation between it and User and between it and Event. The thought here was that I would double join the time periods table, but as it's read only it would be not so expensive as it will always have a relative small size.

What are downsides or improvements?

The User is a person looking to attend events. It sets up his own Schedule (for each day of the week), representing the time of day that he is available to attend EventDays.

An Event is independent of a user and is not added by one. An Event is composed of multiple EventDays, each one composed of the date of the day and the time interval.

For example, User 1 sets his monday Schedule to the interval 10:00-18:00. If an EventDay on a monday starts at 09:00 and ends at 13:00, User 1 can't attend the Event.

philipxy
  • 14,867
  • 6
  • 39
  • 83
Mar
  • 167
  • 1
  • 8
  • What is the purpose of `EventDay` and `StartingAt` and `EndingAt`? I am confused how a user can schedule events _after_ the `EventDay.EndingAt`... Can you please explain the entities in simple text? – Chris Schaller Mar 27 '23 at 12:12
  • 1
    Please clarify via edits, not comments. Please avoid social & meta commentary in posts. – philipxy Mar 27 '23 at 14:26
  • Please before considering posting: Pin down code issues via [mre]. Read manuals/references & google error messages & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. SO/SE search is poor & literal & unusual, read the help. Google re googling/searching, including Q&A at [meta] & [meta.se]. [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) [ask] [Help] Reflect research in posts. – philipxy Mar 27 '23 at 14:33
  • [mre] [Tips for asking a good SQL question](https://meta.stackoverflow.com/q/271055/3404097) [Asking query performance questions](https://dba.meta.stackexchange.com/q/3034/43932) [How to Optimize Queries in a Database - The Basics](https://stackoverflow.com/q/3191623/3404097) [What are your most common sql optimizations?](https://stackoverflow.com/q/1332778/3404097) [When and why are database joins expensive?](https://stackoverflow.com/q/173726/3404097) [Strategy for “Which is better” questions](https://meta.stackexchange.com/q/204461/266284) How are you stuck answering this question? – philipxy Mar 27 '23 at 14:35

1 Answers1

1

Is this approach any good in terms of performance? I've tried some other ways to model this but this one seems the most performant

In general no, using a function as a comparison term in a join is not going to give you the best performance. To achieve the best performance you would want to use a term that is indexable. See this post for some hints on using SARGable filters.

The better way to model this is to add a column to EventDay that has the value of EXTRACT(ISODOW FROM event_day.date) - 1 to prevent this function from being evaluated for each record and to allow you to index it.

In some RDBMS engines date functions are SARGable, but you should still get the best, or at least equal best by adding this field to your EventDay table.

EventDay

id event_id date startingAt endingAt dow
1 1 10.10.2000 20 35 2
2 1 11.10.2000 19 24 3

STOP!

There is a bigger problem here, day of week itself is not a good value to use in a join like this because once the dataset contains more than 1 week of data, you will start getting duplicate matches, one for each week.

The SQL way to approach this is to create a relationship between these two tables and use the EventDay.id as a value in the Schedule table. Then you would be joining on the id of the table. The means you need to do a lookup when you insert the data into the schedule table but each row is only ever inserted once, you will query each row many times, so it is a far better trade off to move that logic to the insert rather than the read process.

A second option is to create a Date Dimension Table so that you can compose SARGable date based expressions in any existing query without needing to modify the schema of the tables you are querying.

  • Generally for questions regarding performance you would include the execution plan, that explains how the engine actually interpreted your query and provides information on the indexes that were used.

  • Also it is helpful to explain some of the other ways that you have tried.

Chris Schaller
  • 13,704
  • 3
  • 43
  • 81
  • Thank you for the insights into SARGable tables! I'll give it a try. I do however not understand your first approach, where you suggest to create a relation between EventDay and Schedule. Could you provide a minimal example of how the tables should look like? If I understood correctly, I would have a 1:m relation between EventDay and Schedule, but I don't see how that helps my use case. – Mar Mar 27 '23 at 10:27
  • Sorry I did misinterpret the requirement a little bit and coming from an MS SQL background it was more challenging to work without native Date or Time data types. But yes I assume that the Schedule is the user _selecting_ specific events on specific days to attend. If this is not correct can you please edit your post to describe the model in plain terms before we try to define the schema. – Chris Schaller Mar 27 '23 at 12:18
  • I added some explanations to the original question, sorry for the confusion. Basically the Schedule is the *user selecting a time period in a day of week*, for it to be matched against existing *event days*. The Schedule isn't updated when a user confirms the participation to an event. Maybe the **Schedule** name is a little non-intuitive; **Availability** better describes the intent – Mar Mar 27 '23 at 14:26