8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
KURTOSIS_POP and KURTOSIS_SAMP Analytic Functions
This article gives an overview of the KURTOSIS_POP
and KURTOSIS_SAMP
analytic functions. If you are new to analytic functions you should probably read this introduction to analytic functions first.
- Basics
- Setup
- KURTOSIS_POP and KURTOSIS_SAMP as Aggregate Functions
- KURTOSIS_POP Analytic Function
- KURTOSIS_SAMP Analytic Function
- Quick Links
Related articles.
Basics
You can read a basic introduction to kurtosis here.
The KURTOSIS_POP
and KURTOSIS_SAMP
aggregate and analytic functions were added in Oracle 21c describe the "tailedness" or shape of a probability distribution. In both cases they return a numeric value with the following meaning.
- Negative : The data has a flatter peak than a normal distribution, and shorter tails. It's more dome shaped.
- Zero : Data has a normal distribution. Most of the values cluster around the mean, with fewer at the tails. Zero is also returned if the data set has less than 3 rows.
- Positive : Fewer values are located near the mean, and more are located near the tails.
- Null : Null values in the expression are ignored, and the function will return null if all rows have a null value for the expression.
The KURTOSIS_POP
and KURTOSIS_SAMP
functions will not return the same results, but both should be representative of the distribution of data. The larger the data set, the more similar their results will be.
Setup
We need some data with various distribution patterns to measure using the KURTOSIS_POP
and KURTOSIS_SAMP
aggregate functions. The following table contains columns that represent skewed data and a normal distribution.
-- drop table t1 purge; create table t1 ( id number generated always as identity, skew number, normal number ); insert into t1 (skew, normal) select case when level > 9800 then 1 else dbms_random.value(400, 500) end, dbms_random.normal from dual connect by level <= 10000; commit;
We will use the following table to demonstrate the use of the KURTOSIS_POP
and KURTOSIS_SAMP
analytic functions. The sample size is too small, but the EMP
table will be familiar to people practicing analytic function using the articles on this site.
create table emp ( empno number(4) constraint pk_emp primary key, ename varchar2(10), job varchar2(9), mgr number(4), hiredate date, sal number(7,2), comm number(7,2), deptno number(2) ); insert into emp values (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20); insert into emp values (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30); insert into emp values (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30); insert into emp values (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20); insert into emp values (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30); insert into emp values (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30); insert into emp values (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10); insert into emp values (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20); insert into emp values (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10); insert into emp values (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30); insert into emp values (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20); insert into emp values (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30); insert into emp values (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20); insert into emp values (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10); commit;
KURTOSIS_POP and KURTOSIS_SAMP as Aggregate Functions
The following query uses the KURTOSIS_POP
aggregate function to display the "tailedness" of the data in the SKEW
and NORMAL
columns.
select kurtosis_pop(skew) as skew, kurtosis_pop(normal) as normal from t1; SKEW NORMAL ---------- ---------- 30.0819301 .0453792598 SQL>
As expected the SKEW
column returns a non-zero value, and the NORMAL
column returns a near-zero value
The KURTOSIS_POP
function uses a sample size of 100% of the rows, which can represent an overhead for large data sets. In contrast the KURTOSIS_SAMP
function uses a smaller sample size, making it more efficient for large data sets, whilst still returning representative results.
select kurtosis_samp(skew) as skew, kurtosis_samp(normal) as normal from t1; SKEW NORMAL ---------- ---------- 30.0975767 .0460021977 SQL>
Using DISTINCT
or UNIQUE
keywords mean only unique values in the expression are used for the calculation. The ALL
keyword is that same as the default action.
select kurtosis_samp(distinct skew) as skew_distinct, kurtosis_samp(unique skew) as skew_unique, kurtosis_samp(all skew) as skew_all, kurtosis_samp(skew) as skew from t1; SKEW_DISTINCT SKEW_UNIQUE SKEW_ALL SKEW ------------- ----------- ---------- ---------- 4.44000461 4.44000461 30.0975767 30.0975767 SQL> SQL>
The DISTINCT
, UNIQUE
and ALL
keywords are also available for the analytic functions.
KURTOSIS_POP Analytic Function
Using an empty OVER
clause turns the KURTOSIS_POP
function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition. The following query uses the KURTOSIS_POP
analytic function to display the "tailedness" of the data in the SAL
column, as well as all the original data.
select empno, ename, deptno, sal, round(kurtosis_pop(sal) over (),2) as sal_kurtosis from emp; EMPNO ENAME DEPTNO SAL SAL_KURTOSIS ---------- ---------- ---------- ---------- ------------ 7369 SMITH 20 800 .49 7499 ALLEN 30 1600 .49 7521 WARD 30 1250 .49 7566 JONES 20 2975 .49 7654 MARTIN 30 1250 .49 7698 BLAKE 30 2850 .49 7782 CLARK 10 2450 .49 7788 SCOTT 20 3000 .49 7839 KING 10 5000 .49 7844 TURNER 30 1500 .49 7876 ADAMS 20 1100 .49 7900 JAMES 30 950 .49 7902 FORD 20 3000 .49 7934 MILLER 10 1300 .49 SQL>
Adding the partitioning clause allows us to display the salary kurtosis per department, along with the employee data for each department.
select empno, ename, deptno, sal, round(kurtosis_pop(sal) over (partition by deptno),2) as sal_kurtosis_by_dept from emp; EMPNO ENAME DEPTNO SAL SAL_KURTOSIS_BY_DEPT ---------- ---------- ---------- ---------- -------------------- 7782 CLARK 10 2450 0 7839 KING 10 5000 0 7934 MILLER 10 1300 0 7566 JONES 20 2975 -1.77 7902 FORD 20 3000 -1.77 7876 ADAMS 20 1100 -1.77 7369 SMITH 20 800 -1.77 7788 SCOTT 20 3000 -1.77 7521 WARD 30 1250 .46 7844 TURNER 30 1500 .46 7499 ALLEN 30 1600 .46 7900 JAMES 30 950 .46 7698 BLAKE 30 2850 .46 7654 MARTIN 30 1250 .46 SQL>
KURTOSIS_SAMP Analytic Function
Using an empty OVER
clause turns the KURTOSIS_SAMP
function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition. The following query uses the KURTOSIS_SAMP
analytic function to display the "tailedness" of the data in the SAL
column, as well as all the original data.
select empno, ename, deptno, sal, round(kurtosis_samp(sal) over (),2) as sal_kurtosis from emp; EMPNO ENAME DEPTNO SAL SAL_KURTOSIS ---------- ---------- ---------- ---------- ------------ 7369 SMITH 20 800 1.32 7499 ALLEN 30 1600 1.32 7521 WARD 30 1250 1.32 7566 JONES 20 2975 1.32 7654 MARTIN 30 1250 1.32 7698 BLAKE 30 2850 1.32 7782 CLARK 10 2450 1.32 7788 SCOTT 20 3000 1.32 7839 KING 10 5000 1.32 7844 TURNER 30 1500 1.32 7876 ADAMS 20 1100 1.32 7900 JAMES 30 950 1.32 7902 FORD 20 3000 1.32 7934 MILLER 10 1300 1.32 SQL>
Adding the partitioning clause allows us to display the salary kurtosis per department, along with the employee data for each department.
select empno, ename, deptno, sal, round(kurtosis_samp(sal) over (partition by deptno),2) as sal_kurtosis_by_dept from emp; EMPNO ENAME DEPTNO SAL SAL_KURTOSIS_BY_DEPT ---------- ---------- ---------- ---------- -------------------- 7782 CLARK 10 2450 0 7839 KING 10 5000 0 7934 MILLER 10 1300 0 7566 JONES 20 2975 -3.1 7902 FORD 20 3000 -3.1 7876 ADAMS 20 1100 -3.1 7369 SMITH 20 800 -3.1 7788 SCOTT 20 3000 -3.1 7521 WARD 30 1250 3.84 7844 TURNER 30 1500 3.84 7499 ALLEN 30 1600 3.84 7900 JAMES 30 950 3.84 7698 BLAKE 30 2850 3.84 7654 MARTIN 30 1250 3.84 SQL>
Quick Links
The "*" indicates the function supports the full analytic syntax, including the windowing clause.
For more information see:
Hope this helps. Regards Tim...