Introduced in Oracle 8i, analytic functions, also known as windowing functions, allow developers to perform tasks in SQL that were previously confined to procedural languages.
Related articles.
The examples in this article require the following table.
--DROP TABLE emp PURGE; CREATE TABLE emp ( empno NUMBER(4) CONSTRAINT pk_emp PRIMARY KEY, ename VARCHAR2(10), job VARCHAR2(9), mgr NUMBER(4), hiredate DATE, sal NUMBER(7,2), comm NUMBER(7,2), deptno NUMBER(2) ); INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20); INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30); INSERT INTO emp VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30); INSERT INTO emp VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20); INSERT INTO emp VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30); INSERT INTO emp VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30); INSERT INTO emp VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10); INSERT INTO emp VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20); INSERT INTO emp VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10); INSERT INTO emp VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30); INSERT INTO emp VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20); INSERT INTO emp VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30); INSERT INTO emp VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20); INSERT INTO emp VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10); COMMIT;
Probably the easiest way to understand analytic functions is to start by looking at aggregate functions. An aggregate function, as the name suggests, aggregates data from several rows into a single result row. For example, we might use the AVG
aggregate function to give us an average of all the employee salaries in the EMP table.
SELECT AVG(sal) FROM emp; AVG(SAL) ---------- 2073.21429 SQL>
The GROUP BY
clause allows us to apply aggregate functions to subsets of rows. For example, we might want to display the average salary for each department.
SELECT deptno, AVG(sal) FROM emp GROUP BY deptno ORDER BY deptno; DEPTNO AVG(SAL) ---------- ---------- 10 2916.66667 20 2175 30 1566.66667 SQL>
In both cases, the aggregate function reduces the number of rows returned by the query.
Analytic functions also operate on subsets of rows, similar to aggregate functions in GROUP BY
queries, but they do not reduce the number of rows returned by the query. For example, the following query reports the salary for each employee, along with the average salary of the employees within the department.
SET PAGESIZE 50 BREAK ON deptno SKIP 1 DUPLICATES SELECT empno, deptno, sal, AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal FROM emp; EMPNO DEPTNO SAL AVG_DEPT_SAL ---------- ---------- ---------- ------------ 7782 10 2450 2916.66667 7839 10 5000 2916.66667 7934 10 1300 2916.66667 7566 20 2975 2175 7902 20 3000 2175 7876 20 1100 2175 7369 20 800 2175 7788 20 3000 2175 7521 30 1250 1566.66667 7844 30 1500 1566.66667 7499 30 1600 1566.66667 7900 30 950 1566.66667 7698 30 2850 1566.66667 7654 30 1250 1566.66667 14 rows selected. SQL>
This time AVG
is an analytic function, operating on the group of rows defined by the contents of the OVER
clause. This group of rows is known as a window, which is why analytic functions are sometimes referred to as window[ing] functions. Notice how the AVG
function is still reporting the departmental average, like it did in the GROUP BY
query, but the result is present in each row, rather than reducing the total number of rows returned. This is because analytic functions are performed on a result set after all join, WHERE
, GROUP BY
and HAVING
clauses are complete, but before the final ORDER BY
operation is performed.
There are some variations in the syntax of the individual analytic functions, but the basic syntax for an analytic function is as follows.
analytic_function([ arguments ]) OVER (analytic_clause)
The analytic_clause
breaks down into the following optional elements.
[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
The sub-elements of the analytic_clause
each have their own syntax diagrams, shown here. Rather than repeat the syntax diagrams, the following sections describe what each section of the analytic_clause
is used for.
The query_partition_clause
divides the result set into partitions, or groups, of data. The operation of the analytic function is restricted to the boundary imposed by these partitions, similar to the way a GROUP BY
clause affects the action of an aggregate function. If the query_partition_clause
is omitted, the whole result set is treated as a single partition. The following query uses an empty OVER
clause, so the average presented is based on all the rows of the result set.
CLEAR BREAKS SELECT empno, deptno, sal, AVG(sal) OVER () AS avg_sal FROM emp; EMPNO DEPTNO SAL AVG_SAL ---------- ---------- ---------- ---------- 7369 20 800 2073.21429 7499 30 1600 2073.21429 7521 30 1250 2073.21429 7566 20 2975 2073.21429 7654 30 1250 2073.21429 7698 30 2850 2073.21429 7782 10 2450 2073.21429 7788 20 3000 2073.21429 7839 10 5000 2073.21429 7844 30 1500 2073.21429 7876 20 1100 2073.21429 7900 30 950 2073.21429 7902 20 3000 2073.21429 7934 10 1300 2073.21429 SQL>
If we change the OVER
clause to include a query_partition_clause
based on the department, the averages presented are specifically for the department the employee belongs too.
BREAK ON deptno SKIP 1 DUPLICATES SELECT empno, deptno, sal, AVG(sal) OVER (PARTITION BY deptno) AS avg_dept_sal FROM emp; EMPNO DEPTNO SAL AVG_DEPT_SAL ---------- ---------- ---------- ------------ 7782 10 2450 2916.66667 7839 10 5000 2916.66667 7934 10 1300 2916.66667 7566 20 2975 2175 7902 20 3000 2175 7876 20 1100 2175 7369 20 800 2175 7788 20 3000 2175 7521 30 1250 1566.66667 7844 30 1500 1566.66667 7499 30 1600 1566.66667 7900 30 950 1566.66667 7698 30 2850 1566.66667 7654 30 1250 1566.66667 SQL>
The order_by_clause
is used to order rows, or siblings, within a partition. So if an analytic function is sensitive to the order of the siblings in a partition you should include an order_by_clause
. The following query uses the FIRST_VALUE
function to return the first salary reported in each department. Notice we have partitioned the result set by the department, but there is no order_by_clause
.
BREAK ON deptno SKIP 1 DUPLICATES SELECT empno, deptno, sal, FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno) AS first_sal_in_dept FROM emp; EMPNO DEPTNO SAL FIRST_SAL_IN_DEPT ---------- ---------- ---------- ----------------- 7782 10 2450 2450 7839 10 5000 2450 7934 10 1300 2450 7566 20 2975 2975 7902 20 3000 2975 7876 20 1100 2975 7369 20 800 2975 7788 20 3000 2975 7521 30 1250 1250 7844 30 1500 1250 7499 30 1600 1250 7900 30 950 1250 7698 30 2850 1250 7654 30 1250 1250 SQL>
Now compare the values of the FIRST_SAL_IN_DEPT
column when we include an order_by_clause
to order the siblings by ascending salary.
SELECT empno, deptno, sal, FIRST_VALUE(sal IGNORE NULLS) OVER (PARTITION BY deptno ORDER BY sal ASC NULLS LAST) AS first_val_in_dept FROM emp; EMPNO DEPTNO SAL FIRST_VAL_IN_DEPT ---------- ---------- ---------- ----------------- 7934 10 1300 1300 7782 10 2450 1300 7839 10 5000 1300 7369 20 800 800 7876 20 1100 800 7566 20 2975 800 7788 20 3000 800 7902 20 3000 800 7900 30 950 950 7654 30 1250 950 7521 30 1250 950 7844 30 1500 950 7499 30 1600 950 7698 30 2850 950 SQL>
In this case the "ASC NULLS LAST
" keywords are unnecessary as ASC
is the default for an order_by_clause
and NULLS LAST
is the default for ASC
orders. When ordering by DESC
, the default is NULLS FIRST
.
It is important to understand how the order_by_clause
affects display order. The order_by_clause
is guaranteed to affect the order of the rows as they are processed by the analytic function, but it may not always affect the display order. As a result, you must always use a conventional ORDER BY
clause in the query if display order is important. Do not rely on any implicit ordering done by the analytic function. Remember, the conventional ORDER BY
clause is performed after the analytic processing, so it will always take precedence.
We have seen previously the query_partition_clause
controls the window, or group of rows, the analytic operates on. The windowing_clause
gives some analytic functions a further degree of control over this window within the current partition, or whole result set if no partitioning clause is used. The windowing_clause
is an extension of the order_by_clause
and as such, it can only be used if an order_by_clause
is present. The windowing_clause
has two basic forms, with a third form added in Oracle 21c.
RANGE BETWEEN start_point AND end_point ROWS BETWEEN start_point AND end_point GROUPS BETWEEN start_point AND end_point (21c onward)
You can read more about the enhancements to the windowing clause from Oracle 21c onward here.
When using ROWS BETWEEN
, you are indicating a specific number of rows relative to the current row, either directly, or via an expression. Assuming you don't cross a partition boundary, that number of rows is fixed. In contrast, when you use RANGE BETWEEN
you are referring to a range of values in a specific column relative to the value in the current row. As a result, Oracle doesn't know how many rows are included in the range until the ordered set is created. GROUPS BETWEEN
treats all rows with the same value as a group, and the start and end points refer to the group rather than the row.
It is possible to omit the BETWEEN
keyword and specify a single RANGE/ROWS
endpoint. In this case, Oracle assumes your specified RANGE/ROWS
is the start point and the end point it the current row. I would advise against using this syntax as it will be unclear to anyone who doesn't understand this default action.
Possible values for "start_point" and "end_point" are:
UNBOUNDED PRECEDING
: The window starts at the first row of the partition, or the whole result set if no partitioning clause is used. Only available for start points.UNBOUNDED FOLLOWING
: The window ends at the last row of the partition, or the whole result set if no partitioning clause is used. Only available for end points.CURRENT ROW
: The window starts or ends at the current row. Can be used as start or end point.value_expr PRECEDING
: A physical or logical offset before the current row using a constant or expression that evaluates to a positive numerical value. When used with RANGE
, it can also be an interval literal if the order_by_clause
uses a DATE
column.value_expr FOLLOWING
: As above, but an offset after the current row.The start point must be before or equal to the end point. In addition, the current row does not have to be part of the window. The window can be defined to start and end before or after the current row.
For analytic functions that support the windowing_clause
, the default action is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. The following query is similar to one used previously to report the employee salary and average department salary, but now we have included an order_by_clause
so we also get the default windowing_clause
.
SELECT empno, deptno, sal, AVG(sal) OVER (PARTITION BY deptno ORDER BY sal) AS avg_dept_sal_sofar FROM emp; EMPNO DEPTNO SAL AVG_DEPT_SAL_SOFAR ---------- ---------- ---------- ------------------ 7934 10 1300 1300 7782 10 2450 1875 7839 10 5000 2916.66667 7369 20 800 800 7876 20 1100 950 7566 20 2975 1625 7788 20 3000 2175 7902 20 3000 2175 7900 30 950 950 7654 30 1250 1150 7521 30 1250 1150 7844 30 1500 1237.5 7499 30 1600 1310 7698 30 2850 1566.66667 SQL>
There are two things to notice here.
order_by_clause
without a windowing_clause
means the query is now returning a running average.windowing_clause
is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
, not ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. The fact it is RANGE
, not ROWS
, means it includes all rows with the same value as the value in the current row, even if they are further down the result set. As a result, the window may extend beyond the current row, even though you may not think this is the case.To illustrate the last point, let's look at the values if we compare RANGE
and ROWS
for the last query. Notice the differences between those lines in bold.
SELECT empno, deptno, sal, AVG(sal) OVER (PARTITION BY deptno ORDER BY sal RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS range_avg, AVG(sal) OVER (PARTITION BY deptno ORDER BY sal ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS rows_avg FROM emp; EMPNO DEPTNO SAL RANGE_AVG ROWS_AVG ---------- ---------- ---------- ---------- ---------- 7934 10 1300 1300 1300 7782 10 2450 1875 1875 7839 10 5000 2916.66667 2916.66667 7369 20 800 800 800 7876 20 1100 950 950 7566 20 2975 1625 1625 7788 20 3000 2175 1968.75 7902 20 3000 2175 2175 7900 30 950 950 950 7654 30 1250 1150 1100 7521 30 1250 1150 1150 7844 30 1500 1237.5 1237.5 7499 30 1600 1310 1310 7698 30 2850 1566.66667 1566.66667 SQL>
In my opinion, the default windowing_clause
should have been ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
. This would make the accidental inclusion of the windowing_clause
much less confusing.
The following query shows one method for accessing data from previous and following rows within the current row using the windowing_clause
. This can also be accomplished with LAG and LEAD.
CLEAR BREAKS SELECT empno, deptno, sal, FIRST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS previous_sal, LAST_VALUE(sal) OVER (ORDER BY sal ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) AS next_sal FROM emp; EMPNO DEPTNO SAL PREVIOUS_SAL NEXT_SAL ---------- ---------- ---------- ------------ ---------- 7369 20 800 800 950 7900 30 950 800 1100 7876 20 1100 950 1250 7521 30 1250 1100 1250 7654 30 1250 1250 1300 7934 10 1300 1250 1500 7844 30 1500 1300 1600 7499 30 1600 1500 2450 7782 10 2450 1600 2850 7698 30 2850 2450 2975 7566 20 2975 2850 3000 7788 20 3000 2975 3000 7902 20 3000 3000 5000 7839 10 5000 3000 5000 SQL>
The EXCLUDE
clause was added in Oracle 21c. You can read about the syntax enhancements here.
The best way to understand what analytic functions are capable of is to play around with them. This article contains links to other articles I've written about specific analytic functions or links to the documentation.
The "*" indicates the function supports the full analytic syntax, including the windowing clause.
For more information see:
Hope this helps. Regards Tim...
Back to normal view: https://oracle-base.com/articles/misc/analytic-functions