I recently needed to come up with a good way to find available slots of time in a schedule for a system that could give the user the opportunity to choose their preferred time slot. The appointment list is stored in MySQL as a set of entries which contain, among other irrelevant things, a start date/time and end date/time. I would guess that pretty much any scheduling system built in a relational database would have a basic structure similar to this.
Now, usually, SQL database queries must come up with clever ways to locate and sort records that are in the database somewhere. I had the interesting task of using SQL to show me only what was not in the database. The appointments table doesn’t store records by hour (or any other segment of time), it stores records by appointment only. A couple of years ago, I created some code that accomplished the basic goal, but it relied heavily on PHP post-processing of the selected appointment records, and it was very, very slow. What I came up with now is significantly simpler, much faster, and elegant enough that I felt like posting my own code online (which doesn’t really happen that often).
I’m not sure if this works in other SQL engines besides MySQL, but I would guess that it does. It may depend to some degree on the order in which query variables are evaluated and subqueries are executed, but I don’t know if there is any difference in this behavior between different engines, or if it’s part of the SQL specification. This query is built assuming your table is named appointments, and it contains AppointmentId, StartTime, and EndTime columns. This is also using an arbitrary search window of 5/15/2010 10:00am through 5/30/2010 10:00am, and a minimum gap length of 60 minutes.
SELECT AFrom.FromEndTime AS `StartTime`, TIMESTAMPDIFF(MINUTE, FromEndTime, ToStartTime) AS `Length` FROM (SELECT @rownum := 0 AS Reset1 ) AS TReset1, (SELECT DISTINCT appointments.AppointmentId AS FromAppointmentId, @rownum := @rownum + 1 AS FromNum, appointments.EndTime AS FromEndTime FROM appointments WHERE StartTime BETWEEN "2010-05-15 10:00:00" AND "2010-05-30 10:00:00" ORDER BY StartTime ) AS AFrom, (SELECT @rownum := 0 AS Reset2 ) AS TReset2, (SELECT DISTINCT appointments.AppointmentId AS ToAppointmentId, @rownum := @rownum + 1 AS ToNum, appointments.StartTime AS ToStartTime FROM appointments WHERE StartTime BETWEEN "2010-05-15 10:00:00" AND "2010-05-30 10:00:00" ORDER BY StartTime ) AS ATo WHERE AFrom.FromNum = ATo.ToNum - 1 AND TIMESTAMPDIFF(MINUTE, FromEndTime, ToStartTime) >= 60
This returns a set of records with two fields: StartTime, which contains the start of the gap, and Length which is the length in minutes. Changing the unit of time is a simple as changing the MINUTE
in the TIMESTAMPDIFF
function to be some other MySQL-friendly unit. I needed minutes, so that’s what I used.
The beauty in this approach is that it uses a rudimentary self-join (WHERE AFrom.FromNum = ATo.ToNum - 1
) based on the SQL-generated per-row counter variable (@rownum
). It is effectively joining each row with the one after it (and ignoring the very last one). I didn’t even know that was possible until I got this query to work. It uses two otherwise ignored subqueries to reset the counter variable to zero before executing the next subquery, which solves the problems of initializing the counter and of the counter continuing to increment after the end of the first subquery of appointments.
This approach has a couple of limitations:
- The very last appointment is ignored because of the way the self-join works. It can’t be joined to the next appointment because there is no next appointment, and so gets filtered out. This isn’t a problem for me because I can simply extend the window of time farther out if I need more appointments, but it’s something to consider.
- If there is a gap between the beginning of the window and the first appointment, it is ignored. This is because the query basically gives you only the gaps between appointments, which means if the first appointment starts after the beginning of the specified window, then that gap isn’t noticed. Something must be present in your code to accommodate for this.
Those are the only issues that have come up for me so far. I hope I don’t find anymore; this thing has been working beautifully as it is now. It’s orders of magnitude better than my last implementation. I’m interested in any bug reports or other limitations though.
3 comments
Out of curiosity, what does the table look like?
The appointments table looks something like this (without the irrelevant columns):
And the result from that data set would look something like this:
[…] scratch) to manage scheduling of multiple resources among multiple people. I even came up with a computationally efficient way to find availability in a schedule by looking for what isn’t t…. This is really not that hard of a problem to solve. Use a priority queue to manage urgent […]