SQL – How to use GROUP BY

You can use GROUP BY to group rows based on one or more columns. This produces one row per group in the query results. Each group contains the grouping columns and the result(s) of any aggregate function(s) (COUNT/MIN/MAX/AVG/SUM) you want to use on the group.

Let’s say you have the following table and you want to see how many movies each user streamed:

Table showing all rows

Here’s how to use GROUP BY to get the results you want:

SELECT UserName, COUNT(*) as StreamCount
FROM Streaming
GROUP BY UserName
Code language: SQL (Structured Query Language) (sql)

This groups the rows by user and shows the row count per group (which in this example is the number of movies they streamed):

Query results showing groups with row counts

I’ll explain a few common scenarios and problems to watch out for (such as dealing with NULLs).

GROUP BY multiple columns

You can group by one or more columns. Here’s an example of using GROUP BY with multiple columns:

SELECT UserName, StreamedDate, COUNT(*) as StreamCount
FROM Streaming
GROUP BY UserName, StreamedDate
Code language: SQL (Structured Query Language) (sql)

This produces the following results, showing how many movies each user watched per date:

Query results showing groups with multiple grouping columns and row count

GROUP BY and NULLs

When the grouping column contains a NULL, it’ll produce a NULL group (along with all the other groups). Here’s an example of what this looks like:

Query results showing a NULL group

If this is undesirable, you can use WHERE IS NOT NULL to filter out rows containing nulls, therefore eliminating the NULL group:

SELECT UserName, SUM(WatchedMins) as StreamCount
FROM Streaming
WHERE UserName IS NOT NULL
GROUP BY UserName
Code language: SQL (Structured Query Language) (sql)

Error: Column is invalid in the select list

There are constraints about which columns you can use in the select list when you use GROUP BY. Here’s an example of attempting to use an invalid column (WatchedMins) in the select list:

SELECT UserName, WatchedMins, COUNT(*)
FROM Streaming
GROUP BY UserName
Code language: SQL (Structured Query Language) (sql)

This results in the following error:

Msg 8120, Level 16, State 1, Line 1
Column ‘Streaming.WatchedMins’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

The error message is clear. You can only use a column in the select list if it’s a grouping column (used in the GROUP BY list) or if it’s used in an aggregate function.

Here’s an example of putting WatchedMins in the GROUP BY list so it can be used in the select list:

SELECT UserName, WatchedMins, COUNT(*)
FROM Streaming
GROUP BY UserName, WatchedMins
Code language: SQL (Structured Query Language) (sql)

You can also use the column with aggregate function, such as SUM():

SELECT UserName, SUM(WatchedMins), COUNT(*)
FROM Streaming
GROUP BY UserName
Code language: SQL (Structured Query Language) (sql)

Leave a Comment