Delete Duplicate Row - SQL

To delete duplicate rows from a table in SQL, you can use a combination of the DELETE statement and subqueries to identify and remove the duplicate rows. One common approach is to use a temporary table or a common table expression (CTE) to first identify the duplicate rows and then delete them from the original table.

Here's a general approach using a CTE

Syntax
WITH DuplicateRows AS (
    SELECT column1, column2, ..., ROW_NUMBER() OVER (PARTITION BY column1, column2, ... ORDER BY column1) AS row_num
    FROM your_table_name
)
DELETE FROM DuplicateRows
WHERE row_num > 1;
  • your_table_name: The name of the table from which you want to remove duplicate rows.
  • column1, column2, ...: The columns based on which you want to identify duplicate rows.
  • ROW_NUMBER() OVER (PARTITION BY column1, column2, ... ORDER BY column1) AS row_num: This assigns a unique row number to each row within each group of duplicates, based on the specified columns.
  • row_num > 1: This condition filters out all but one row from each group of duplicates, leaving only one instance of each unique row.

Example

Let's say you have a table named employees and you want to remove duplicate rows based on the email column:

Syntax
WITH DuplicateRows AS (
    SELECT email, ROW_NUMBER() OVER (PARTITION BY email ORDER BY email) AS row_num
    FROM employees
)
DELETE FROM DuplicateRows
WHERE row_num > 1;

This SQL statement will delete all but one row from each group of duplicates based on the email column in the employees table.

Always remember to backup your data before performing delete operations, especially when dealing with data manipulation that may affect multiple rows.