SQL: How to find duplicates?

Quick Example: Find Duplicates

A simple SQL query example to find duplicates:

sql

SELECT email, COUNT() FROM customers
GROUP BY email HAVING COUNT() > 1;

On this page

SQL: How to find duplicates?
1.Use GROUP BY
2.Use ROW_NUMBER() and a Common Table Expression (CTE)
3.Use LAG() and a CASE statement

SQL: How to find duplicates?

In SQL Server, there are 3 main ways to find duplicates:

1.Use GROUP BY

To find duplicates using the GROUP BY method in SQL:

Select the columns that you want to check for duplicates.
Use the GROUP BY clause to group the data by those columns.
Use the HAVING clause to filter the results to show only the groups that have more than one entry.

Full example:

sql

SELECT email, COUNT() FROM customers
GROUP BY email HAVING COUNT() > 1;

Example

Let's say you have a table called "customers" with columns "name" and "email":

id	name	email
1	John Smith	`john.smith@example.com`
2	Jane Doe	`jane.doe@example.com`
3	J Doe	`jane.doe@example.com`
4	Bob Johnson	`bob.johnson@example.com`

To find duplicate email addresses, you could use the following SQL query:

sql

SELECT email, COUNT() FROM customers GROUP BY email HAVING COUNT() > 1;

This query will group the customers by email address and count the number of entries in each group. It will then filter the results and show only the groups with more than one entry, which are duplicate email addresses.

Here's the result:

email	COUNT()
`jane.doe@example.com`	2

2.Use ROW_NUMBER() and a Common Table Expression (CTE)

Partition the data based on the uniqueness of a search field.
Use the ROW_NUMBER function. Duplicates will have a row number greater than one.
Use a CTE (common table expression) block to select only the duplicated records.

sql

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY name DESC) AS rn
    FROM customers
)
SELECT * FROM cte
WHERE rn > 1;

3.Use LAG() and a CASE statement

The third method uses an analytical function:

Order records by the search field.
Use the LAG function to get the previous value of a search field.
Use the OVER keyword to apply lag on the search field.
Checks if the current value matches the previous value. If it does, it is considered a duplicate.

LAG is an analytical function that accesses the value stored in a row before the current row.

With LAG, you must specify ORDER BY in the OVER clause.

Example

Query:

sql

SELECT email,
    CASE
        WHEN email = LAG(email) OVER (ORDER BY email) THEN 'Yes'
        ELSE 'No'
    END AS duplicate_value
FROM customers
ORDER BY email;

Result:

email	duplicate_value
`bob.johnson@example.com`	No
`jane.doe@example.com`	No
`jane.doe@example.com`	Yes
`john.smith@example.com`	No

Published on: Mar 16, 2023

SQL: How to find duplicates? | 3 Simple Ways

Quick Example: Find Duplicates

SQL: How to find duplicates?

1.Use GROUP BY

Example

2.Use ROW_NUMBER() and a Common Table Expression (CTE)

3.Use LAG() and a CASE statement

Example

Related Posts