8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23ai | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
CORR Analytic Function
This article gives an overview of the CORR
analytic function. If you are new to analytic functions you should probably read this introduction to analytic functions first.
Related articles.
- CORR Analytic Function
- Analytic Functions : All Articles
- COVAR_POP and COVAR_SAMP Analytic Functions
- STDDEV, STDDEV_POP and STDDEV_SAMP Analytic Functions
Setup
The examples in this article require the following table.
--DROP TABLE emp PURGE; CREATE TABLE emp ( empno NUMBER(4) CONSTRAINT pk_emp PRIMARY KEY, ename VARCHAR2(10), job VARCHAR2(9), mgr NUMBER(4), hiredate DATE, sal NUMBER(7,2), comm NUMBER(7,2), deptno NUMBER(2) ); INSERT INTO emp VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20); INSERT INTO emp VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30); INSERT INTO emp VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30); INSERT INTO emp VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20); INSERT INTO emp VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30); INSERT INTO emp VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30); INSERT INTO emp VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10); INSERT INTO emp VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20); INSERT INTO emp VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10); INSERT INTO emp VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30); INSERT INTO emp VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20); INSERT INTO emp VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30); INSERT INTO emp VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20); INSERT INTO emp VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10); COMMIT;
CORR as an Aggregate Function
The CORR
aggregate function is used to calculate the coefficient of correlation, or Pearson correlation coefficient. As an aggregate function it reduces the number of rows, hence the term "aggregate". If the data isn't grouped we turn the 14 rows in the EMP
table to a single row with the aggregated values. The following example includes the CORR_S
(Spearman rho correlation coefficient) and CORR_K
(Kendall tau-b correlation coefficient) functions, which are only available as aggregate functions. In the following example we are correlating the length of employment with the salary.
SELECT CORR(SYSDATE - hiredate, sal) AS corr_val, CORR_S(SYSDATE - hiredate, sal) AS corr_s_val, CORR_K(SYSDATE - hiredate, sal) AS corr_k_val FROM emp; CORR_VAL CORR_S_VAL CORR_K_VAL ---------- ---------- ---------- -.02116564 .082690238 .111733587 SQL>
We can get more granularity of information by including a GROUP BY
clause. In the following example we see the values on a per-job basis.
SELECT job, CORR(SYSDATE - hiredate, sal) AS corr_val_by_job, CORR_S(SYSDATE - hiredate, sal) AS corr_s_val_by_job, CORR_K(SYSDATE - hiredate, sal) AS corr_k_val_by_job FROM emp GROUP BY job ORDER BY job; JOB CORR_VAL_BY_JOB CORR_S_VAL_BY_JOB CORR_K_VAL_BY_JOB --------- --------------- ----------------- ----------------- ANALYST CLERK -.3250778 .4 .333333333 MANAGER .978243943 1 1 PRESIDENT SALESMAN .205987887 .210818511 .182574186 SQL>
In both cases we have aggregated the data to get the values, returning less rows than we started with. Analytic functions allow us to return these aggregate values while retaining the original row data.
CORR Analytic Function
The CORR
analytic function is used to calculate the coefficient of correlation, or Pearson correlation coefficient. The function computes the following on rows in the set with no null values for either expression.
COVAR_POP(expr1, expr2) / (STDDEV_POP(expr1) * STDDEV_POP(expr2))
The function returns a value between +1 and -1, which represent the following, along with an example interpretation.
- +1 : Total Positive linear correlation. The longer you've been employed the more you get paid proportionally.
- 0 : No linear correlation. There is no relationship between time served and pay.
- -1 : Total Negative linear correlation. The longer you've been employed the less you get paid proportionally.
The basic description for the CORR
analytic function is shown below. The analytic clause is described in more detail here.
CORR(expr1, expr2) [ OVER (analytic_clause) ]
Using an empty OVER
clause turns the CORR
function into an analytic function. The lack of a partitioning clause means the whole result set is treated as a single partition, so we get the correlation of the length of employment with the salary for all employees, as well as all the original data.
SELECT empno, ename, deptno, sal, job, CORR(SYSDATE - hiredate, sal) OVER () AS corr_val FROM emp; EMPNO ENAME DEPTNO SAL JOB CORR_VAL ---------- ---------- ---------- ---------- --------- ---------- 7369 SMITH 20 800 CLERK -.02116564 7499 ALLEN 30 1600 SALESMAN -.02116564 7521 WARD 30 1250 SALESMAN -.02116564 7566 JONES 20 2975 MANAGER -.02116564 7654 MARTIN 30 1250 SALESMAN -.02116564 7698 BLAKE 30 2850 MANAGER -.02116564 7782 CLARK 10 2450 MANAGER -.02116564 7788 SCOTT 20 3000 ANALYST -.02116564 7839 KING 10 5000 PRESIDENT -.02116564 7844 TURNER 30 1500 SALESMAN -.02116564 7876 ADAMS 20 1100 CLERK -.02116564 7900 JAMES 30 950 CLERK -.02116564 7902 FORD 20 3000 ANALYST -.02116564 7934 MILLER 10 1300 CLERK -.02116564 SQL>
Adding the partitioning clause allows us to display the correlation per department, along with the employee data.
SELECT empno, ename, deptno, sal, job, CORR(SYSDATE - hiredate, sal) OVER (PARTITION BY deptno) AS corr_val FROM emp; EMPNO ENAME DEPTNO SAL JOB CORR_VAL ---------- ---------- ---------- ---------- --------- ---------- 7782 CLARK 10 2450 MANAGER .074769211 7839 KING 10 5000 PRESIDENT .074769211 7934 MILLER 10 1300 CLERK .074769211 7566 JONES 20 2975 MANAGER .044630566 7902 FORD 20 3000 ANALYST .044630566 7876 ADAMS 20 1100 CLERK .044630566 7369 SMITH 20 800 CLERK .044630566 7788 SCOTT 20 3000 ANALYST .044630566 7521 WARD 30 1250 SALESMAN .417964304 7844 TURNER 30 1500 SALESMAN .417964304 7499 ALLEN 30 1600 SALESMAN .417964304 7900 JAMES 30 950 CLERK .417964304 7698 BLAKE 30 2850 MANAGER .417964304 7654 MARTIN 30 1250 SALESMAN .417964304 SQL>
Alternatively we can display the correlation per job type, along with the employee data.
SELECT empno, ename, deptno, sal, job, CORR(SYSDATE - hiredate, sal) OVER (PARTITION BY job) AS corr_val FROM emp ORDER BY JOB, deptno; EMPNO ENAME DEPTNO SAL JOB CORR_VAL ---------- ---------- ---------- ---------- --------- ---------- 7902 FORD 20 3000 ANALYST 7788 SCOTT 20 3000 ANALYST 7934 MILLER 10 1300 CLERK -.3250778 7369 SMITH 20 800 CLERK -.3250778 7876 ADAMS 20 1100 CLERK -.3250778 7900 JAMES 30 950 CLERK -.3250778 7782 CLARK 10 2450 MANAGER .978243943 7566 JONES 20 2975 MANAGER .978243943 7698 BLAKE 30 2850 MANAGER .978243943 7839 KING 10 5000 PRESIDENT 7654 MARTIN 30 1250 SALESMAN .205987887 7844 TURNER 30 1500 SALESMAN .205987887 7521 WARD 30 1250 SALESMAN .205987887 7499 ALLEN 30 1600 SALESMAN .205987887 SQL>
Quick Links
The "*" indicates the function supports the full analytic syntax, including the windowing clause.
For more information see:
- CORR
- CORR_*
- CORR Analytic Function
- Analytic Functions : All Articles
- COVAR_POP and COVAR_SAMP Analytic Functions
- STDDEV, STDDEV_POP and STDDEV_SAMP Analytic Functions
Hope this helps. Regards Tim...