Aggregate Queries
Table of Contents
How Group Queries Work
Queries with a visible Group builder bar – and eventually filtered by Having – are aggregate query. Once you expose the Group builder bar, each selected field or expression may get by default a Group By indication that aggregations are expected on distinct values of that field. Switch that to an aggregate function and data of that field will be instead compressed into an unique value.
Let’s show total commissions per job in department 30, when average of salaries for those jobs is greater that $1,000. Start by identifying the filters, the conditions that slices your data. And it is so important to separate here filters that apply to all your data, and filters per group. Department 30 applies to all your data, so this goes in the Where clause, while average of salaries > 1000 applies to each of your group and this will go into Having. Where conditions apply BEFORE Group aggregates your data, while Having is similar to Where, except it shows up only when Group is visible and applies to subsets of grouped data, AFTER each group has been aggregated by Group. When your condition uses an aggregate function, it’s usually Having you have to use, not Where.
Now play a bit with your builder to get more instant information:
- Drill-down on your result rows and check the All Aggregates tab. Any group query will expose in the query result an All Aggregates tab with individual rows for all data that has been aggregated by the top query! For the SALESMAN job in department 30, you had 4 aggregated records. You can check the average of their SAL is indeed 1400, and the sum of their COMM in indeed 2200.
- Hide the Where builder bar to extend your search to all departments.
- Hide the Having builder bar to discover one job (CLERK) that didn’t make the cut (the average of SAL was 950).
- Hide the Group builder bar (Having will be automatically hidden as well) for a list of all (6) people in department 30.
Distinct Values
You want no duplicate rows. For instance, you get employees’ jobs and departments, but combinations like (ANALYST, 20) appear more than once. Solution? Group By all these values, and duplicates are automatically removed:
Or even simpler (and equivalent): don’t show at all the Group builder bar, but click on the Select only DISTINCT records toolbar button. In SQL, SELECT DISTINCT a, b, c is equivalent to SELECT a, b, c GROUP BY a, b, c. Distinct button is disabled when Group bar is visible, because you cannot have both DISTINCT and GROUP BY in the same query, it’s one or the other.
As usual, when you drill-down, you may instantly see which duplicates you had, in the All Aggregates tab. And you may apply the same techniques when you’re quickly looking for unique field values. Here is a drill-down on the same result, but when Distinct is used instead (remark the Group bar is hidden and not used):
Aggregate Functions
Aggregate functions extract one single value from field values from multiple rows. Rows have distinct field values where Group By has been specified instead of an aggregate. Except for Grouping (which is described at Grouping Sets), here are our possible aggregate functions:
- Count All Rows – counts all rows, regardless of the values their fields hold.
- Count – counts all not Null values.
- Count Distinct – counts all different not Null values (skips duplicates).
- Sum – sums all not Null values.
- Sum Distinct – sums all distinct not Null values (duplicates ignored).
- Average – average number for all not Null values.
- Average Distinct – average number for all distinct not Null values.
- List – list of all aggregate values. Separator and option to sort returned list are globally configurable.
- List Distinct – list of all unique aggregate values.
- Maximum / First – greatest number or first text string in alphabetical order.
- Minimum / Last – smallest number or last text string in alphabetical order.
Switch to design mode the Group Aggregates demo query. It uses the FruitU demo query, which looks like a virtual dynamic data table. The query just calculates all possible aggregates on the quantity field, for the whole set of data (no field with Group By). The result is calculated by the database server and/or by our application, in the Summary bar. You should usually choose one or the other.
With Group is disabled, these are the rows we aggregated in Summary AFTER the query has been executed. This query is not a group aggregate query!
Enable Group and run the query again. You’ll see a similar result, but calculated by the database server. Now this is a group aggregate query indeed! In design SQL mode (no data results) or when disabled, Summary shows aggregate function names.
When there is not Group By column, the query returns one single row. Drill-down to discover in the All Aggregates tab the list of all rows that have been aggregated:
Count Queries
Getting fast the number of entries in a table, for a query result, for unique values of a field, are typical problems.
- Use as COUNT Query – toolbar button in your query builder that quickly gets you the number of records your current query will return.
- Get Total Rows Query – contextual menu command for a table, view or query, that quickly generates and runs a SELECT COUNT(*) FROM table.
- Get Total Aggregates Query – contextual menu command for a table/view/query column, that quickly generates and runs an aggregate query with Group By on that field and COUNT on the rest.
- Get Total Duplicates Query – contextual menu command for a table/view/query column, that quickly generates and runs an aggregate query that returns the duplicate values.
Let’s check this last query to get the duplicates in the emp.JOB field. Switch to design mode, run the query and drill-down to see the 3 duplicates of the MANAGER job:
Count with Sum
You need to provide several filtered count aggregates, and in the same query if possible. Let’s say we need total number of employees, total number of SALESMAN or CLERK employees, number of direct reports to BLAKE, number of employees with no commission at all, and number of employees hired before or in 1981.
The trick here is to return 1 or 0 for each conditional formula, then aggregate with Sum (not Count!), except for total number of employees, which does not require a filtered aggregation:
Other notable remarks:
- usage of a list of values (for SALESMAN and CLERK) instead of separate multiple equality comparisons.
- lookup field to manager’s name.
- no value in a field (COMM) means field Is Null.
- usage of a query expression for HIREDATE (while you cannot have function calls in a conditional formula).
First in each Group
One frequent requirement is to isolate first (or last) record(s) in a group. For instance, we want the best paid employees in each department. If more than one employee has the same biggest pay in the department, show them all. Start by simply getting all max salaries per department:
To link top paid employee(s) to each group of this result, we have to combine a non-aggregate set to an aggregate set. We’ll have to add again the emp table, as first_emp. We check on ENAME and SAL and we fix the join conditions between the sets: top employees should be in the same department as the group, so we add a Where clause on DEPTNO. And top employee’s salary should match the maximum in the group, so we add a Having clause on SAL (whenever you link/compare with an aggregate function, use Having, not Where!):
An interesting alternative solution without grouping and aggregates is based on the custom join emp.DEPTNO = first_emp.DEPTNO AND emp.SAL < first_emp.SAL: