Wednesday, December 18, 2024

SQL Aggregate Functions

 SQL aggregate functions are used to perform calculations on a set of values and return a single value. They are often used in conjunction with the GROUP BY clause to group rows based on some column(s), allowing for summaries or analysis of data within each group. These functions work with numeric data types and return a result like sum, average, count, etc.

Here are the most commonly used SQL aggregate functions:

1. COUNT()

  • Purpose: Returns the number of rows that match a specified condition.
  • Syntax:
    SELECT COUNT(column_name) FROM table_name WHERE condition;
  • Example: Count the number of employees in a department.
    SELECT COUNT(*) FROM employees WHERE department_id = 1;

2. SUM()

  • Purpose: Returns the total sum of a numeric column.
  • Syntax:
    SELECT SUM(column_name) FROM table_name WHERE condition;
  • Example: Get the total salary of employees in a specific department.
    SELECT SUM(salary) FROM employees WHERE department_id = 2;

3. AVG()

  • Purpose: Returns the average value of a numeric column.
  • Syntax:
    SELECT AVG(column_name) FROM table_name WHERE condition;
  • Example: Get the average salary of employees in a department.
    SELECT AVG(salary) FROM employees WHERE department_id = 3;

4. MIN()

  • Purpose: Returns the smallest value in a column.
  • Syntax:
    SELECT MIN(column_name) FROM table_name WHERE condition;
  • Example: Get the employee with the lowest salary.
    SELECT MIN(salary) FROM employees;

5. MAX()

  • Purpose: Returns the largest value in a column.
  • Syntax:
    SELECT MAX(column_name) FROM table_name WHERE condition;
  • Example: Get the employee with the highest salary.
    SELECT MAX(salary) FROM employees;

6. GROUP_CONCAT() (MySQL, SQLite)

  • Purpose: Returns a concatenated string of non-null values from a group.
  • Syntax:
    SELECT GROUP_CONCAT(column_name) FROM table_name GROUP BY group_column;
  • Example: Get a list of employees' names for each department.
    SELECT department_id, GROUP_CONCAT(employee_name) FROM employees GROUP BY department_id;

7. VARIANCE() / VAR_POP() / VAR_SAMP()

  • Purpose: Returns the variance of a numeric column (how far the numbers are from the mean). VAR_POP() is for population variance, and VAR_SAMP() is for sample variance.
  • Syntax:
    SELECT VARIANCE(column_name) FROM table_name;
  • Example: Calculate the variance in employee salaries.
    SELECT VAR_SAMP(salary) FROM employees;

8. STDDEV() / STDDEV_POP() / STDDEV_SAMP()

  • Purpose: Returns the standard deviation of a numeric column. STDDEV_POP() is for population standard deviation, and STDDEV_SAMP() is for sample standard deviation.
  • Syntax:
    SELECT STDDEV(column_name) FROM table_name;
  • Example: Calculate the standard deviation in salaries.
    SELECT STDDEV(salary) FROM employees;

9. FIRST() / LAST() (varies by database)

  • Purpose: Returns the first and last values in a set. Behavior can vary by database and is typically used in ORDER BY queries.
  • Example: Get the first and last employee's hire date in a department.
    SELECT FIRST(hire_date), LAST(hire_date) FROM employees WHERE department_id = 5 ORDER BY hire_date;

Usage with GROUP BY

SQL aggregate functions are often used with the GROUP BY clause to perform calculations over each group of rows. Here is an example of how to use aggregate functions with GROUP BY:

Example: Calculate the total salary and average salary for each department.

SELECT department_id, SUM(salary) AS total_salary, AVG(salary) AS avg_salary FROM employees GROUP BY department_id;

In this example:

  • The SUM(salary) function calculates the total salary for each department.
  • The AVG(salary) function calculates the average salary for each department.
  • The GROUP BY department_id groups the results by department.

HAVING Clause

  • Purpose: The HAVING clause is used to filter records after aggregation has been performed (whereas WHERE is used before aggregation).
  • Example: Find departments where the total salary exceeds a certain amount.
    SELECT department_id, SUM(salary) AS total_salary FROM employees GROUP BY department_id HAVING SUM(salary) > 100000;

In this query:

  • The HAVING clause filters out the departments where the sum of the salaries is less than or equal to 100,000.

Important Notes:

  • Aggregate functions ignore NULL values, except for COUNT() when counting rows.
  • Some databases might have variations or additional aggregate functions, such as STRING_AGG() in PostgreSQL or SQL Server.

These functions are essential for reporting and analyzing data in databases, enabling users to summarize and aggregate data efficiently.

No comments:

Post a Comment