Window Functions
Table of Contents
Categories of Window Functions
Window functions (also called Analytic Functions, or OLAP Functions) are SQL functions that can be followed by a SQL OVER clause in a SELECT item. Categories of window functions:
- Window Aggregates – common SUM/AVG/COUNT/MIN/MAX aggregates used as analytic functions, with an OVER clause each.
- Ranking Functions – global functions.
- Offset Functions – contextual navigation functions.
- Distribution Functions – global functions.
- Statistical Functions – like variance/deviation functions, co-variance/correlation functions, and linear regression functions.
- Internal “Over…” Functions – simple intuitive Xtractor functions to generate the SQL OVER clause.
Window Function Menus
In Data Xtractor and Query Xtractor, you can call window functions, and internal Over functions for the OVER clause, through the contextual menu of the visual SQL query builder.
(1) Global window functions are accessible from the top Field menu. There is no previous context, so the functions available in the Window Function global menu are:
- Ranking Functions: RowNumber, Rank etc.
- Distribution Functions: PercentRank, CumulativeDistribution etc.
- internal Over calls that define a reusable OVER clause: OverPartitionBy, OverOrderBy, OverRows etc.
(2) Contextual window functions are accessible from a Formula cell, and apply to a previous context, defined in either the top Field cell or another previous Formula. The functions available now in the Window Function contextual menu are:
- Offset Functions: Lag/Lead, First/LastValue etc.
- internal Over calls for an inline OVER clause or to continue the definition of a reusable OVER clause: OverPartitionBy, OverOrderBy, OverRows etc/
(3) Aggregate window functions are also contextual, called from a Formula cell on a context previously defined (usually a database table field). These are usually typical aggregates that can be also used as analytic functions, with an OVER clause. The aggregate functions accessible from the Window Aggregate contextual menu depend on the data type. Numeric data includes SUM and AVG, unlike the aggregate functions below, available for a string data type.”
OVER Clause with Over Functions
For databases with window functions support, Query Xtractor and Data Xtractor expose a few simple internal functions whose names start with Over in the Window Function menu, that can be called one after another in the Formula or Field builder:
This specific call order is mandatory in SQL, but not here. However, all Over calls must be grouped one after another, with no other calls between. We collect all OverPartitionBy calls first, followed by all OverOrderBy calls, and end up with the first OverRows/Range/Groups call. Warnings are generated for redundant calls.
- Over(window=[NULL]) – by default, with NULL or no parameter, provides an empty OVER () clause, which cannot be combined with any other Over call. Or you can pass the name of a query column with a window definition on it (see the WINDOW clause below).
- OverPartitionBy(column=NULL) – translates into a PARTITION BY entry. Multiple calls can be used when there are more than one inline GROUP BY fields or query expressions.
- OverOrderBy(column=NULL, ascending=[TRUE], nullsFirst=[NULL]) – translates into an ORDER BY inline entry. Multiple calls can be used when there are more than one inline fields or query expressions to sort with. ascending can be FALSE to add the DESC keyword. When supported, use TRUE or FALSE for nullsFirst.
- OverRows/Range/Groups(from=FIRST, to=CURRENT, exclude=[]) – translates into a ROWS, RANGE or GROUPS standard SQL clause. FIRST and LAST point to the first and last rows in the partition frame. CURRENT points to the current row. A negative or positive integer offset reflects a number of rows back or forth. FIRST/LAST, CURRENT and the offset translate into standard SQL UNBOUNDED PRECEDING/FOLLOWING, CURRENT ROW and n PRECEDING/FOLLOWING keywords. Last parameter is for the standard SQL EXCLUDE clause, when supported, and may take one of the CURRENT, TIES or GROUP values.
Visual ROWS/RANGE/GROUPS and EXCLUDE Clauses
OverRows/Range/Groups calls get a special visual treatment when you edit their function arguments. Full details about this special function and its popup here. You can move the top and bottom margins to determine the actual frame boundaries, as in the following picture:
Simple OVER Clause Implementation
Following designed SQL query is a simple example for the OVER clause implementation.
- SUM here is an Window Aggregate Function, which is allowed in this case to be followed by an OVER clause. The SUM aggregate function IS also a window function, like most other aggregates.
- Two OverPartitionBy calls translate into the two-field definition of the PARTITION BY clause.
- Two OverOrderBy calls translate into the two-field definition of the ORDER BY clause.
- One OverRows call further translates into the final ROWS BETWEEN clause.
Reusable OVER Clause with Window Definition
When you start from a top Field with an Over function call, you create a reusable window definition. This column can have only other Over function calls, in Formula, and can be referenced by the column header name in other inline OVER clause declarations.
You can always create reusable OVER clauses, regardless of your database support for them. This is a Data Xtractor feature, not necessarily a SQL feature. They get translated into standard SQL WINDOW clauses when supported. If not, they get repeated as inline SQL OVER clauses in the columns referencing them.
WINDOW Clause with Over Functions
You can also reference another window definition from a window definition.
This may translate into one or more separate standard SQL WINDOW clauses, when your relational database supports it. Or, when not supported, or when you want on purpose an Emulated WINDOW Clause, we automatically fill-in all OVER clauses with the separate window definitions.
The generated SQL query below is the automatic translation of the last visual query with nested WINDOW definitions:
SELECT EXTRACT(YEAR FROM invoice_.`InvoiceDate`) AS `Year`,
invoice_.`InvoiceDate` AS `Date`,
invoice_.`Total` AS `Amount`,
SUM(invoice_.`Total`) OVER (`w1`
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS `Sum`
FROM `chinook`.`invoice` AS invoice_
WINDOW `w2` AS (
PARTITION BY EXTRACT(YEAR FROM invoice_.`InvoiceDate`)),
`w1` AS (`w2`
ORDER BY invoice_.`InvoiceDate`)
ORDER BY 1, 2
Ranking Functions
Ranking functions are global window functions, called from the top Field builder row, followed by Over formula calls. These functions generate sequence numbers for your rows.
- Rank – similar for first row in a group with the same value, but keeps that value for all other rows within the same group.
- DenseRank – somehow similar, but it increments its previous value by 1 when detecting a new group.
- PercentRank –
- RowNumber – similar to a running sum: it starts with 1 for the first row, 2 for second etc.
- NTile(n=10) –
All ranking functions are configured through a dedicated popup, as described in detail here:
The Ranking Functions demo query (click on the image to enlarge), with generated SQL (after Hide SQL for Subqueries) below:
SELECT name, quantity,
ROW_NUMBER() OVER (ORDER BY quantity AS RowNumber,
RANK() OVER (ORDER BY quantity) AS Rank,
DENSE_RANK() OVER (ORDER BY quantity) AS DenseRank,
NTILE(4) OVER (ORDER BY quantity) AS NTile
FROM FruitsU
ORDER BY 2
Offset Functions
Offset functions are contextual navigational window functions that you call on a database field, followed by Over function calls. They may return the value of another row within the same data set, by relative or absolute position.
- Lag(offset=1) – returns the value that was in that column offset number of rows before (or Null if none). Lag(1) returns the value in the previous results row, Lag(3) three rows before.
- Lead(offset=1) – returns the value in that column offset number of rows after. Lead(1) returns the value in the next results row, Lag(3) three rows after.
- FirstValue – returns the value in the first row.
- LastValue – returns the value in the last row.
- NthValue –
- RatioToReport –
In Data Xtractor, all offset window functions are visually defined and configured, through a specific configuration popup. Here is an example for a LAG(4) call with a default value:
And here is the Offset Functions demo query (click on the image to enlarge), with generated SQL (after Hide SQL for Subqueries) below:
SELECT name,
id,
LAG(id) OVER (ORDER BY id) AS `lag`,
LEAD(id) OVER (ORDER BY id) AS lead_1,
LAG(id, 3) OVER (ORDER BY id) AS lag_3,
LEAD(id, 3) OVER (ORDER BY id) AS lead_3,
FIRST_VALUE(id) OVER (ORDER BY id) AS `first_value`,
LAST_VALUE(id) OVER (ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS `last_value`
FROM FruitsU