Visual SQL OVER ROWS Clause for Window Functions

Published by Cristian Scutaru on

When complex SQL queries are generated for you, there is no need to expose SQL’s complexity when you handle everything visually. The unnecessary complexity of the SQL OVER ROWS clause is hidden by a simple visual selection of a frame on screen. You can much better understand how ROWS works after playing for just a few minutes with our visual SQL query builder for window functions.

Table of Contents

Why is the OVER ROWS Clause so Complex?

When the 2012 T-SQL features were first announced back in 2011, I must confess that framing intimidated me a bit. I didn’t really like the syntax and didn’t quite understand what was going on.

Expert T-SQL Window Functions in SQL Server – by Kathi Kellenberger, 2015

I witnessed firsthand, in one of my past jobs, how great windows functions are and how they can solve problems very hard to fix any another way. When our database architect at that time suggested we could use just plain SQL queries to transform huge server-side batches of data that would normally require complex cursor-based operations, I didn’t believe it was actually possible. But I ended up by successfully implementing the algorithms he suggested.

I still suspect today most people find the window functions most complex than they are. One reason is there have been some confusing, complex and uninspired syntax choices there. I always wondered why the inline ORDER BY clause was replicated as it was, but we needed a new PARTITION BY clause for what obviously is an inline GROUP BY clause.

The nightmare comes however from the ROWS/RANGE/GROUPS clause, that sometimes can be used after an OVER ORDER BY to further restrict the inline frame. Leaving aside the additional confusing alternative RANGE keyword and implementation, it took me a while to figure out UNBOUNDED PRECEDING is just the FIRST frame row, and UNBOUNDED FOLLOWING is the LAST frame row.

Usability of this clause is simply bad. Look at how much space they needed at Oracle to explain what should have been the definition of a simple FROM-TO interval:

Oracle Rows

What OVER ROWS Clause Actually Does

ROWS filters the partition frame for each current row, by specifying its top and bottom limits, that’s all. It’s an additional filter you may add after an OVER ORDER BY inline specification. And it can be used with an aggregate function.

The so-called offset-based or navigational analytic functions LEAD/LAG, FIRST/LAST_VALUE and NTH_VALUE, will also internally use their own frame. LAG and FIRST_VALUE look backwards, at a relative offset or just the first frame row. While LEAD and LAST_VALUE look forward, as a relative offset or just the last frame row.

First Last Value All

When you have no ROWS clause at all, the frame is still filtered by default between the first and current row in partition. You do not select by default everything within the partition, from first to last row, as you may wrongly assume.

When you pass just one single value, with no BETWEEN, the end value of your frame is considered the CURRENT ROW by default. You don’t have just one single row value selected, as you may wrongly assume.

Improved OVER ROWS Visual Clause in Data Xtractor

First time we started to provide Data Xtractor support for the ROWS clause, we fell in the same trap of trying to map the so many variations to different functions. We ended up with no less than eight OverRowsFrom and OverRowsTo functions, which was huge.

  1. We figured out at the end we actually need just three OverRows, OverRange and OverGroups functions for all, to translate into the ROWS, RANGE, GROUPS clauses. With an optional EXCLUDE, when supported by your specific database.
  2. First From and To parameters have to be mandatory, to avoid guessing the default values and to always give a precise indication of what the frame limits are.
  3. From and To parameters should use shorter and more intuitive values and keywords. As it could be hard to guess what UNBOUNDED stands for, we rather used FIRST and LAST for the relative maximum and minimum edges of a set.
  4. CURRENT would obviously be for the current row, or the current group (depending on the context). A CURRENT_B keyword as To parameter will force generation of a BETWEEN operation, with CURRENT ROW as last item.
  5. A strict positive or strict negative offset would stand for the relative numbers of rows after or before the current row. A zero offset is also valid in SQL and transparently translates into the CURRENT row, but we eliminated on purpose this additional unnecessary redundant complication.
  6. Both parameters can be the same, except you can never have FIRST as second parameter and LAST as first parameter.
  7. A third parameter is optional and reserved for the EXCLUDE clause, when supported and used. This can be one of the CURRENT, GROUP or TIES keywords.

We also figured out for this specific function that the generic Edit Function Arguments popup wouldn’t work. The function is too specific and visually sliding the two edges of the frame on screen would be easier and provide a much better user experience.

  1. By sliding the edges of a visual frame, you can never reverse by mistake the start and end limits.
  2. The offset controls become visible only when the top negative or bottom positive offset values are required. This is only when one of the frame margins edges a current +/- offset entry.
  3. Each frame change triggers the update of both the SQL clause and our function signature in the popup title.

Visual OVER ROWS Edit Popup in Data Xtractor

OverRows(from FIRST to CURRENT) is the default definition for the ROWS clause. When you don’t have a ROWS clause at all, or when you skip the second parameter from its SQL definition, this is by default the selected frame. Generated SQL ROWS can also use a BETWEEN n AND m expression, when the expression ends with the CURRENT ROW.

And this is what you get as first popup on any new frame selection:

ROWS from FIRST to CURRENT

OverRows(from 1st row to 5th row) is an example on how to limit the frame to one specific set. This could be one single row, rows with current row in the middle, or rows on one side of the current row.  This works with strict positive or negative offset values, or on the current row.

ROWS n FOLLOWING

OverRows(from offset 1000000 to LAST) is a last example on how you can pass whatever big positive numeric values you want, as they will always be kept valid. You can never switch invalid From and To values for the frame, or allow for invalid combinations. And you’ll never have to remember what the actual SQL syntax is, as it is generated for you on screen after any possible change:

ROWS from CURRENT to LAST

Visual SQL Queries Calling OverRows

Following visual SQL query emulates LAG and LEAD with the ROWS frame selection of a single row.

First column looks at the second entry two rows behind, for each current row. Our custom visual OverRows function takes the same -2 offset value for both from and to parameters. Third column looks at the second entry two rows ahead, for each current row. The function takes the same 2 offset value for both from and to parameters.

Using Min or Max on the single value will simply return that value. Unlike Avg or Sum, Min and Max also work on text values.

Rows Clause Single Row

The next visual SQL query emulates FIRST_VALUE and LAST_VALUE with some specific ROWS frame selections.

First column selects the minimum value from the frame between the first and current row, with an intuitive OverRows(from FIRST to CURRENT) call. This is the frame always selected by default, and the explicit call can be obviously omitted. For names sorted in ascending order, as they are here, this will always be the first Apples string value.

First column selects the maximum value from the frame between the current and last row, with an intuitive OverRows(from CURRENT to LAST) call. For names sorted in ascending order, as they are here, this will always be the last Plums string value.

Rows First Last Value

Support and Compatibility Issues

The visual SQL OVER ROWS clause has been tested and implemented in Data Xtractor for the following relational databases, with built-in window functions support:

  • SQLite – which supports window functions since 2018, with v3.25.
  • Microsoft SQL Server – including Azure.
  • Oracle
  • IBM DB2
  • PostgreSQL –  Amazon Redshift, which is based on Postgres, should work as well.
  • MySQL – which supports window functions since v8.0.2. Amazon Aurora, which is MySQL-based, should work as well once they start using a more recent version.
  • MariaDB – which supports window functions since v10.2.
  • Firebird – it supports window functions now, but no ROWS clause yet.
  • SQL Anywhere

Categories: Query Builder

Cristian Scutaru

I designed and implemented the Data Xtractor suite, with Model Xtractor, Query Xtractor, and Visual Xtractor as separate modules. I am a software architect and developer with over 30 years professional experience. I’ve been working with relational databases for almost three decades and I was constantly unhappy with the relative limitation of those tools used to connect directly to a platform, and instantly extract and display data in flexible ways.