SQL – How to use GROUP BY

Here’s an example of how to use GROUP BY in a SQL query:

SELECT UserName, COUNT(*) as StreamCount FROM Streaming GROUP BY UserName
Code language: SQL (Structured Query Language) (sql)

This produces the following query results:

Grid showing GROUP BY COUNT(*) query results: UserName, StreamCount Jesse, 3 Mike, 4, Walter, 3

GROUP BY allows you to group rows based on one or more columns and calculate aggregate values for the group. In the example above, the COUNT(*) aggregate function was used to show the number of rows in each group. The other main aggregate functions are SUM(), AVG(), MIN(), and MAX().

In this article, I’ll show examples of how to use GROUP BY with the aggregate functions. In addition, I’ll show how to do filtering and sorting when using GROUP BY. First, I’ll show the sample data that was used to create these examples (including the one above).

Sample data

Here’s the sample data. It’s transactional records about which movies were streamed by users (for how long and when).

Grid showing 10 records in Streaming table: StreamedDate,UserName,Title,WatchedMins 2022-04-26,Walter,Inception,148 2022-04-26,Mike,Goodfellas,146 2022-04-26,Jesse,El Camino,122 2022-04-28,Walter,Jurassic World,30 2022-04-28,Jesse,The Matrix,136 2022-04-28,Jesse,The Matrix Reloaded,17 2022-04-29,Mike,The Godfather,175 2022-04-29,Mike,The Godfather: Part II,202 2022-05-01,Walter,Interstellar,169 2022-05-02,Mike,John Wick,101

GROUP BY multiple columns

You can group by one or more columns. Here’s an example of using GROUP BY with multiple columns:

SELECT UserName, StreamedDate, COUNT(*) as StreamCount FROM Streaming GROUP BY UserName, StreamedDate
Code language: SQL (Structured Query Language) (sql)

This produces the following results, showing how many movies users watched per date:

Grid showing results of GROUP BY with multiple columns: UserName,StreamedDate,StreamCount Jesse,2022-04-26,1 Mike,2022-04-26,1 Walter,2022-04-26,1 Jesse,2022-04-28,2 Walter,2022-04-28,1 Mike,2022-04-29,2 Walter,2022-05-01,1 Mike,2022-05-02,1

Aggregate functions with GROUP BY

GROUP BY is often used with aggregate functions to calculate summary values for each group. There are five main aggregate methods: COUNT(*), SUM(), AVG(), MIN(), and MAX().

I’ll show a few examples.

Name the aggregate result

It’s a good idea to name the aggregate result (aka aliasing). For example, use COUNT(*) as Count instead of just COUNT(*). Otherwise the column name will be blank / default in the results.

Of course, this isn’t a problem if you’re just executing ad hoc queries and don’t really need to map in code or save the results.

SUM() with GROUP BY

Here’s an example of using SUM() with GROUP BY:

SELECT UserName, SUM(WatchedMins) as TotalWatchedMins FROM Streaming GROUP BY UserName
Code language: SQL (Structured Query Language) (sql)

This produces the following results, showing the total amount of time each user spent watching movies:

Grid showing GROUP BY SUM() results: UserName,TotalWatchedMins Jesse,275 Mike,624 Walter,347

MIN() and MAX() with GROUP BY

Here’s an example of using MIN() and MAX() with GROUP BY:

SELECT UserName, MIN(WatchedMins) as MinTimeWatched, MAX(WatchedMins) as MaxTimeWatched FROM Streaming GROUP BY UserName
Code language: SQL (Structured Query Language) (sql)

This produces the following results, showing the range (min and max) of time the user spent watching a movie:

Grid showing GROUP BY MIN() and MAX() results: UserName,MinTimeWatched,MaxTimeWatched Jesse,17,136 Mike,101,202 Walter,30,169

Note: This also shows how you can use multiple aggregate functions.

Aggregate function without grouping

You can use the aggregate functions without GROUP BY when you want to calculate summary values for the whole table (instead of based on groups):

SELECT COUNT(*) as StreamCount, SUM(WatchedMins) as TotalWatchedTime FROM Streaming
Code language: SQL (Structured Query Language) (sql)

This produces the following summary values for the whole table:

Grid showing query results for COUNT(*) and SUM() without GROUP BY: StreamCount,TotalWatchedTime 10,1246

GROUP BY without an aggregate function

While it’s very common to use GROUP BY with an aggregate function, you can use it without one too. Here’s an example:

SELECT UserName, StreamedDate FROM Streaming GROUP BY UserName, StreamedDate

This produces the following results, essentially showing the unique user / date combinations:

Grid showing results for GROUP BY without an aggregate function: UserName,StreamedDate Jesse,2022-04-26 Jesse,2022-04-28 Mike,2022-04-26 Mike,2022-04-29 Mike,2022-05-02 Walter,2022-04-26 Walter,2022-04-28 Walter,2022-05-01

GROUP BY filtering

WHERE filters individual rows, then GROUP BY groups the filtered rows, and finally HAVING filters the groups based on an aggregate value.

I’ll show examples.

GROUP BY WHERE – Filters before grouping

Here’s an example of using WHERE with GROUP BY to only produce groups from rows with a specific value:

SELECT StreamedDate, SUM(WatchedMins) as TotalWatchedMins FROM Streaming WHERE UserName = 'Walter' GROUP BY StreamedDate
Code language: SQL (Structured Query Language) (sql)

This produces the following results, showing a specific user’s (Walter) total time watched per date:

Grid showing GROUP BY WHERE results: StreamedDate,TotalWatchedMins 2022-04-26,148 2022-04-28,30 2022-05-01,169

GROUP BY HAVING – Filters groups

Here’s an example of using HAVING to filter out groups based on an aggregate value:

SELECT StreamedDate, COUNT(*) as StreamCount FROM Streaming GROUP BY StreamedDate HAVING COUNT(*) > 2
Code language: SQL (Structured Query Language) (sql)

This produces the following results, only showing groups with more than two rows:

Grid showing GROUP BY HAVING results: StreamedDate,StreamCount 2022-04-26,3 2022-04-28,3

GROUP BY with ORDER BY

You can sort the GROUP BY output with ORDER BY.

Here are examples.

ORDER BY aggregate result

Here’s an example of sorting groups by an aggregate result:

SELECT StreamedDate, SUM(WatchedMins) as TotalWatchedMins FROM Streaming GROUP BY StreamedDate ORDER BY TotalWatchedMins DESC
Code language: SQL (Structured Query Language) (sql)

This produces the following results, showing the dates sorted by total minutes watched:

Grid showing GROUP BY with ORDER BY SUM results: StreamedDate,TotalWatchedMins 2022-04-26,416 2022-04-29,377 2022-04-28,183 2022-05-01,169 2022-05-02,101

Note: This is using the aggregate result alias in ORDER BY. If you don’t use the alias (i.e. ORDER BY TotalWatchedMins), then you have to type the aggregate function call again (i.e. ORDER BY SUM(WatchedMins).

ORDER BY grouping column

Here’s an example of sorting groups by based on the column that is being grouped on (StreamedDate):

SELECT StreamedDate, SUM(WatchedMins) as TotalWatchedMins FROM Streaming GROUP BY StreamedDate ORDER BY StreamedDate DESC
Code language: SQL (Structured Query Language) (sql)

This produces the following, showing that it sorted the groups by date in descending order:

Grid showing GROUP BY with ORDER BY grouping column results: StreamedDate,TotalWatchedMins 2022-05-02,101 2022-05-01,169 2022-04-29,377 2022-04-28,183 2022-04-26,416

GROUP BY and nulls

When you GROUP BY a column that has nulls, it’ll produce a NULL group:

Grid showing GROUP BY with a NULL group: UserName,StreamCount NULL,55 Jesse,275 Mike,624 Walter,347

If this is undesirable, you may want to filter out those rows with a WHERE IS NOT NULL:

SELECT UserName, SUM(WatchedMins) as StreamCount FROM Streaming WHERE UserName IS NOT NULL GROUP BY UserName
Code language: SQL (Structured Query Language) (sql)

Leave a Comment