SQL: How to find duplicates? | 3 Simple Ways

By Josip Miskovic
A thumbnail showing how to find duplicates in SQL.

Quick Example: Find Duplicates

A simple SQL query example to find duplicates:

sql
SELECT email, COUNT() FROM customers
GROUP BY email HAVING COUNT() > 1;

SQL: How to find duplicates?

In SQL Server, there are 3 main ways to find duplicates:

1.Use GROUP BY

To find duplicates using the GROUP BY method in SQL:

  1. Select the columns that you want to check for duplicates.
  2. Use the GROUP BY clause to group the data by those columns.
  3. Use the HAVING clause to filter the results to show only the groups that have more than one entry.
  4. Full example:
    sql
    SELECT email, COUNT() FROM customers
    GROUP BY email HAVING COUNT() > 1;

Example

Let's say you have a table called "customers" with columns "name" and "email":

id name email
1 John Smith john.smith@example.com
2 Jane Doe jane.doe@example.com
3 J Doe jane.doe@example.com
4 Bob Johnson bob.johnson@example.com

To find duplicate email addresses, you could use the following SQL query:

sql
SELECT email, COUNT() FROM customers GROUP BY email HAVING COUNT() > 1;

This query will group the customers by email address and count the number of entries in each group. It will then filter the results and show only the groups with more than one entry, which are duplicate email addresses.

Here's the result:

email COUNT()

jane.doe@example.com

2

2.Use ROW_NUMBER() and a Common Table Expression (CTE)

  1. Partition the data based on the uniqueness of a search field.
  2. Use the ROW_NUMBER function. Duplicates will have a row number greater than one.
  3. Use a CTE (common table expression) block to select only the duplicated records.
sql
WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY email ORDER BY name DESC) AS rn
    FROM customers
)
SELECT * FROM cte
WHERE rn > 1;

3.Use LAG() and a CASE statement

The third method uses an analytical function:

  1. Order records by the search field.
  2. Use the LAG function to get the previous value of a search field.
  3. Use the OVER keyword to apply lag on the search field.
  4. Checks if the current value matches the previous value. If it does, it is considered a duplicate.

LAG is an analytical function that accesses the value stored in a row before the current row.

  • With LAG, you must specify ORDER BY in the OVER clause.

Example

Query:

sql
SELECT email,
    CASE
        WHEN email = LAG(email) OVER (ORDER BY email) THEN 'Yes'
        ELSE 'No'
    END AS duplicate_value
FROM customers
ORDER BY email;

Result:

email duplicate_value
bob.johnson@example.com No
jane.doe@example.com No
jane.doe@example.com Yes
john.smith@example.com No
Josip Miskovic
About Josip

Josip Miskovic is a software developer at Americaneagle.com. Josip has 10+ years in experience in developing web applications, mobile apps, and games.

Read more posts →
Published on:
Download Free Software Developer Career Guide

I've used these principles to increase my earnings by 63% in two years. So can you.

Dive into my 7 actionable steps to elevate your career.