Window Function Framing
Table of Contents
Window Frame Definition
A ROWS/RANGE/GROUPS clause can follow the ORDER BY clause within an OVER window function construct. The clause limits the boundaries of the selected frame, which by default is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
Maximum frame goes between its first and last row, which are designated by UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING. You may also go back and forth a number of rows or groups, or with an offset value for RANGE, from the current row or group, with n PRECEDING or n FOLLOWING.
An EXCLUDE clause can eventually follow the frame definition, to further remove the current row and/or its peer ties (i.e. duplicate values) from each selected set.
Here are some examples for these particular clauses:
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING ROWS BETWEEN 3 PRECEDING AND 5 FOLLOWING EXCLUDE CURRENT ROW ROWS 3 PRECEDING RANGE BETWEEN CURRENT ROW AND CURRENT ROW EXCLUDE TIES GROUPS 3 PRECEDING EXCLUDE GROUP
Not all database vendors supporting window functions provided full implementations for the ROWS, RANGE, GROUPS and EXCLUDE clauses. Here is the status today for the databases with window functions, supported by Data Xtractor:
- PostgreSQL – its last version 11 offered an almost complete implementation of these standards SQL clauses. Before v11, the RANGE clause did not allow offset values.
- SQLite – its last version 3.28 followed PostgreSQL 11 and fully implemented as well all these clauses.
- MySQL – added window functions support only recently, since v8.0.2. No GROUP and EXCLUDE clauses.
- MariaDB – followed MySQL and added window functions support only recently, since v10.2. No GROUP and EXCLUDE clauses.
- Oracle – had window functions for a long time. No GROUP and EXCLUDE clauses.
- Microsoft SQL Server – started with limited window functions since v2005, but added more serious support in 2012. No GROUP and EXCLUDE clauses. RANGE clause does not allow offset values.
- IBM DB2 – had window functions for a while. No GROUP and EXCLUDE clauses.
- SAP/Sybase SQL Anywhere – with window functions since v12. No GROUP and EXCLUDE clauses.
- Amazon Redshift – added specific window functions support since the first version. No RANGE, GROUP and EXCLUDE clauses.
- Firebird – added window functions support since v3 (2016), but it supports none of the framing clauses presented here before.
The visual SQL query builder of Query Xtractor and Data Xtractor makes it much easier to properly and interactively select your window function frame. While in a Formula cell, select one of the OverRows, OverRange, or OverGroups functions from the Window Function contextual menu, and click anytime on Edit Function Arguments to customize it.
Here is a possible popup, translating into valid ROWS and EXCLUDE SQL clauses, as indicated in the Generated SQL area [please note the function signature, from the title area, changed slightly in the last versions]:
The popup lets you adjust the top and bottom horizontal boundaries for different combinations. This feature alone makes it impossible to switch the top and bottom margins of the frame. From/To offset values appear when needed, with proper validators (these have to be positive numbers).
Two simple checkboxes allow you to fully implement the EXCLUDE clause, when supported. Each interactive change updates three areas: the generated SQL, the function signature (as popup’s title) and a proper information of what your selection means (in the left-bottom corner of the window).
Any OverRows/Range/Groups function call translates into a related ROW/RANGE/GROUPS clause, followed eventually by an EXCLUDE clause. All these functions have a similar signature, as it follows:
- First two parameters are the From and To boundaries for the frame. They can be either one of the FIRST/LAST/CURRENT keywords, or a numeric positive/negative offset value.
- FIRST and LAST always designate the first and last frame row (we preferred them to UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING).
- CURRENT is the current row. It is not just a simple shortcut for CURRENT ROW, as it may indicate – for RANGE and GROUPS – the current group.
- CURRENT_B can appear as second parameter, instead of CURRENT, to force a BETWEEN operation. Normally, when second parameter is CURRENT ROW, it may and will be omitted.
- Negative numbers represent PRECEDING offset values, while positive numbers are used for FOLLOWING offset values.
- Last fourth parameter is optional, for the EXCLUDE clause, and can be CURRENT, TIES or GROUP.
ROWS with Current Row Selection
In the next few chapters, we present some typical valid frame selections (with no EXCLUDE, as this clause will be presented separately below).
First is a simple frame selection from the first to the current row:
RANGE with Current Group Selection
Simply extend the selection of the current row right below to the current group. ROWS UNBOUNDED PRECEDING becomes a RANGE UNBOUNDED PRECEDING (we assume RANGE is supported by your database).
The difference is RANGE will include the current row with all its duplicate values, while ROWS does not. Remark you cannot have such a generic current group selection with ROWS. Or with GROUPS, as it is equivalent with RANGE in this use case.
RANGE with DateTime INTERVAL
RANGE is meant to act as a logical operator. Unlike the physical operator ROWS, acting on physical records, RANGE is supposed to go back and forth an interval of time, when used with a date/time value. You cannot tell how old or recent are the selected frame entries, when you look for a fixed number of rows before or after the current row. With RANGE, you should be able to select frame entries for the past few months, or the next few hours.
So far, databases with proper support for both (1) the RANGE clause using offsets and (2) the Interval data type are just PostgreSQL v11+ and Oracle. For these two database types, you can try to select as RANGE offsets not just a simple numeric value, but also a year/month/day or hour/minute/second. Your selections will be properly translated into SQL using the INTERVAL keyword, and in our OverRange’s function signature.
Here is an example of a popup configuration for a RANGE between 6 month back from the current row to 6 months ahead. The combo boxes with Month/Day/etc items appear only for a RANGE offset, for databases with INTERVAL support. Look in particular at the updated function signature (in popup’s title) and the generated SQL:
GROUPS with PRECEDING and Forced BETWEEN
Further slide the top margin to groups before, and customize the From value. This will select the current group and everything up to the third group before the current group. By “group” we mean a cluster of rows all with the same values – but all this is already described on the window below. We assume the GROUPS clause is also supported by your database (as in SQLite or PostgreSQL 11+).
Check also Always Use BETWEEN, as by default, all frames ending with the CURRENT ROW use the simplify SQL notation with no BETWEEN.
EXCLUDE Current Row and/or Group
EXCLUDE is a standard SQL clause for window functions, which has been implemented so far only by the latest versions of PostgreSQL (v11+) and SQLite (v3.28+). EXCLUDE keyword can follow a ROWS/RANGE/GROUPS clause and take one of the following parameters:
- CURRENT ROW – exclude the single current row.
- GROUP – exclude the current row and all its peer ties (i.e duplicate values).
- TIES – exclude all current row’s peer ties, except the current row itself.
- NO OTHERS – default, does nothing.
Our popup allows you to fully define this clause, with two simple checkboxes: one that excludes the current row alone (translated into an EXCLUDE CURRENT ROW) and one that exclude its ties (translated into an EXCLUDE TIES). Obviously, when both are checked, we rather generate the EXCLUDE GROUP, which is the combination between the current group plus its ties. EXCLUDE NO OTHERS is simply redundant and never used.
Here is a frame selection that excludes both the current row and all its ties. Remark that if you end your selection with the current row, a similar ROWS clause will be generated instead. As the frame ends with CURRENT ROW, we forced a BETWEEN in the RANGE clause:
One other improvement is we always skip the EXCLUDE clause entirely if there is no current row or group in your frame selection. And we do not forget to tell you about it, in case you used it. The exclude checkboxes are hidden, when the frame selection simply leaves the current row and group outside.
For instance, a frame definition like ROWS BETWEEN UNBOUNDED PRECEDING AND 3 PRECEDING with an EXCLUDE clause will make no sense, as the EXCLUDE has nothing to exclude: the current row is never included in this frame selection:
When the EXCLUDE clause is simply not supported, both exclude checkboxes are still visible, but disabled and unchecked.
Invalid Use Cases
For different reasons, some combinations are not possible. Instead of preventing those multiple use cases to happen, we rather chose to display the info as an error and let the user learn why they are wrong. Here is such an example, that tells us SQL constructs to select either the first row alone, or the last row alone, are not allowed:
Many combinations are either allowed, or not recommended, or unreliable to use. There are cases where everything goes fine, but the expected result is simply wrong. Our popup prevents this! Here are most of these unusual combinations:
- Cannot select just the first (UNBOUNDED PRECEDING) or the last (UNBOUNDED PRECEDING) row. We tested it with all our databases, and the SQL queries simply generate an error.
- All operations with no offsets and no current group selection will always use ROWS. Of course, it’s still possible to manually edit the generated query and replace ROWS with RANGE or GROUPS, when supported. But queries like those selecting the whole frame, or from the current row to one of the frame’s boundaries, are simply equivalent for ROWS/RANGE/GROUPS in this case. And ROWS looks like having the best performance, as you are not concerned with ties or offset values.
- Can never use a RANGE or GROUP offset with the current row selection. Select the current group instead. Of course, this still generates a CURRENT ROW in your SQL query, but it’s important to know both RANGE and GROUP, unlike ROWS, will always select the current row together with its ties (i.e. duplicate values). The CURRENT ROW from the generated SQL query is confusing and doesn’t tell you this, we do.
- Can never start with a frame selection that comes after the end of the frame selection. For instance, can never start with the last row and go back to the current row, to close the frame. Or start with the current row and go back to the first row. All these would generate SQL errors. But we prevented by simply not allowing the frame selector to switch its top and bottom margins at any time.
- Cannot use a ROWS BETWEEN 1 PRECEDING AND 2 PRECEDING, or ROWS BETWEEN 2 FOLLOWING AND 1 FOLLOWING. First offset must be higher than second, for two PRECEDING values. Or smaller than second, for two FOLLOWING values. We tested this in PostgreSQL 11 and unfortunately there is no SQL error, but all returned result values are wrong.
- All From/To offset values, when used, must be strict positive numbers. They must be integers for ROWS and GROUPS, or floating numbers for RANGE. But anything other than valid numbers will trigger an error. Do not get confused by the eventual negative numbers in function’s signature: we simply represent internally the PRECEDING offset values by negative numbers, to differentiate them from FOLLOWING offset values.
- Current group start and current group end frame items can be either both excluded or both included in the current frame, when the current row is excluded as well. You cannot have one included and the other excluded. This makes for the current group selection.
- You cannot select a range before or range after offset, when the RANGE clause is not supported by your database. Also, you cannot select a group before or group after offset, when the GROUPS clause is not supported by your database.
- You cannot select a range before or range after offset, when the RANGE clause is supported by your database, but does not allow offset values. This is the case for Microsoft SQL Server and prior versions of PostgreSQL (before v11).
- You cannot select the current group, if the RANGE clause is not supported by your database. Remark current group selection can also be used for GROUPS, but RANGE is always supported when GROUPS is.
- You cannot mix, in the From/To offset value fields, row offset numbers with group offset numbers or range offset values. A “range/rows/group before” field is not valid with a “range/rows/group after” field of a different type.
- You cannot use the exclude checkboxes is the EXCLUDE clause is not supported by your database. We actually disable these boxes and leave then unchecked when it’s the case, so you don’t have to be concerned with this case.