When complex SQL queries are generated for you, there is no need to expose SQL’s complexity when you handle everything visually. The unnecessary complexity of the SQL OVER ROWS clause is hidden by a simple visual selection of a frame on screen. You can much better understand how ROWS works after playing for just a few minutes with our visual SQL query builder for window functions.
Table of Contents
Why is the OVER ROWS Clause so Complex?
When the 2012 T-SQL features were first announced back in 2011, I must confess that framing intimidated me a bit. I didn’t really like the syntax and didn’t quite understand what was going on.
Expert T-SQL Window Functions in SQL Server – by Kathi Kellenberger, 2015
I witnessed firsthand, in one of my past jobs, how great windows functions are and how they can solve problems very hard to fix any another way. When our database architect at that time suggested we could use just plain SQL queries to transform huge server-side batches of data that would normally require complex cursor-based operations, I didn’t believe it was actually possible. But I ended up by successfully implementing the algorithms he suggested.
I still suspect today most people find the window functions most complex than they are. One reason is there have been some confusing, complex and uninspired syntax choices there. I always wondered why the inline ORDER BY clause was replicated as it was, but we needed a new PARTITION BY clause for what obviously is an inline GROUP BY clause.
The nightmare comes however from the ROWS/RANGE clause, that sometimes can be used after an OVER ORDER BY to further restrict the inline frame. Leaving aside the additional confusing alternative RANGE keyword and implementation, it took me a while to figure out UNBOUNDED PRECEDING is just the FIRST frame row, and UNBOUNDED FOLLOWING is the LAST frame row.
Usability of this clause is simply bad. Look at how much space they needed at Oracle to explain what should have been the definition of a simple FROM-TO interval:
What OVER ROWS Clause Actually Does
ROWS filters the partition frame for each current row, by specifying its top and bottom limits, that’s all. It’s an additional filter you may add after an OVER ORDER BY inline specification. And it can be used with an aggregate function.
The so-called offset-based or navigational analytic functions LEAD/LAG, FIRST/LAST_VALUE and NTH_VALUE, will also internally use their own frame. LAG and FIRST_VALUE look backwards, at a relative offset or just the first frame row. While LEAD and LAST_VALUE look forward, as a relative offset or just the last frame row.
When you have no ROWS clause at all, the frame is still filtered by default between the first and current row in partition. You do not select by default everything within the partition, from first to last row, as you may wrongly assume.
When you pass just one single value, with no BETWEEN, the end value of your frame is considered the CURRENT ROW by default. You don’t have just one single row value selected, as you may wrongly assume.
Improved OVER ROWS Visual Clause in Data Xtractor
When we started providing Data Xtractor support for the ROWS clause, we fell in the same trap of trying to map the so many variations to different functions. We ended up with no less than eight OverRowsFrom and OverRowsTo functions, which was huge.
- We figured out at the end we actually need one single function for all, with two From–To mandatory parameters for the beginning and starting of the frame:
- From-To parameters have to be mandatory, to avoid guessing the default values and to always give a precise indication of what the frame limits are.
- From-To parameters should use shorter and more intuitive values and keywords. As it’s hard to guess what UNBOUNDED would stand for, we rather used FIRST and LAST for the relative maximum and minimum edges of a set.
- CURRENT would obviously be for the current row, and a strict positive or negative offset would stand for the relative numbers of rows after or before the current row. A zero offset is also valid in SQL and transparently translates into the CURRENT row, but we also eliminated on purpose this additional unnecessary redundant complication.
- Both parameters can be the same, except you can never have FIRST as second parameter and LAST as first parameter.
- There are always two parameters, never just one. This is to avoid guessing if that’s about selecting just one single value, or a range with ending edge you must guess again what its value could be.
We also figured out for this specific function that the generic Edit Function Arguments popup wouldn’t work. The function is too specific and visually sliding the two edges of the frame on screen would be easier and provide a much better user experience.
- By sliding the edges of a visual frame, you can never reverse by mistake the start end end limits.
- The visual frame selection does not allow you to select the first row or last row alone.
- The drop-down controls become enabled only when the top negative or bottom positive offset values are required. This is only when one of the frame margins edges a current +/- offset entry.
- Each frame change triggers the update of both the SQL ROWS clause and our OverRowsBetween function as popup title.
Visual OVER ROWS Edit Popup in Data Xtractor
OverRowsBetween(from FIRST, to CURRENT) is the default definition for our function. When you don’t have a ROWS clause, or when you skip the second parameter from its SQL definition, this is by default the selected frame. And this is what you get as first popup on any new frame selection:
OverRowsBetween(from offset 1, to offset 5) is an example on how to limit the frame to one specific set. This could be one single row, rows with current row in the middle, or rows on one side of the current row. This works with strict positive or negative offset values, or on the current row.
You can never select just the first or the last row: we tested all supported relational databases and this is not allowed. Generated SQL ROWS clause also translates into a BETWEEN n AND m expression, never into a single value – which you may assume you can use, when it actually translates into a frame ending in CURRENT ROW instead.
OverRowsBetween(from offset 1000000, to LAST) is a last example on how you can pass whatever big positive numeric values you want, as they will always be kept valid. You can never switch invalid from and to values for the frame or allow for invalid combinations. And you’ll never have to remember what the actual SQL syntax is, as it is generated for you on screen after any possible change:
Visual SQL Queries Calling OverRowsBetween
First column looks at the second entry two rows behind, for each current row. Our custom visual OverRowsBetween function takes the same -2 offset value for both from and to parameters. Thirst column looks at the second entry two rows ahead, for each current row. Our custom visual OverRowsBetween function takes the same 2 offset value for both from and to parameters.
Using Min or Max on the single value will simply return that value. Unlike Avg or Sum, Min and Max also work on text values.
The next visual SQL query emulates FIRST_VALUE and LAST_VALUE with some specific ROWS frame selections.
First column selects the minimum value from the frame between the first and current row, with an intuitive OverRowsBetween(FIRST, CURRENT) call. This is the frame always selected by default, and the explicit call can be obviously omitted. For names sorted in ascending order, as they are here, this will always be the first Apples string value.
First column selects the maximum value from the frame between the current and last row, with an intuitive OverRowsBetween(CURRENT, LAST) call. For names sorted in ascending order, as they are here, this will always be the last Plums string value.
Support and Compatibility Issues
The visual SQL OVER ROWS clause has been tested and implemented in Data Xtractor for the following relational databases, with built-in window functions support:
- SQLite – which supports window functions since 2018, with v3.25.
- Microsoft SQL Server – including Azure.
- IBM DB2
- PostgreSQL – Amazon Redshift, which is based on Postgres, should work as well (not tested yet).
- MySQL – which supports window functions since v8.0.2. Amazon Aurora, which is MySQL-based, should work as well once they start using a more recent version.
- MariaDB – which supports window functions since v10.2.
- Firebird – it supports window functions now, but no ROWS clause yet.
- SQL Anywhere – it may work as well, but not tested.