In Oracle, removing duplicates from a column (or a table) typically involves two common approaches: using a SELECT DISTINCT
query to filter out duplicates, or by using a DELETE
statement to remove the duplicates from a table. Below are the two main methods depending on whether you just want to display the data without duplicates, or actually remove the duplicate rows from the table.
1. Displaying Unique Values (Removing Duplicates in Query Result)
If you only need to retrieve unique values from a column without modifying the data in the table, you can use the SELECT DISTINCT
statement. This will return only distinct values in the result set.
SELECT DISTINCT column_name
FROM table_name;
2. Removing Duplicates from a Table (Physically Deleting Duplicates)
If you want to actually remove duplicate rows from the table, keeping only the first occurrence of each unique value, you can use a DELETE
statement in combination with ROWID
or ROW_NUMBER()
.
Using ROWID
One way to remove duplicates is by leveraging the ROWID
, which is a unique identifier for each row in a table. This approach works if you have a simple table structure and want to delete exact duplicates of a specific column.
For example, if you have a table with the following columns: id
, column_name
, and you want to remove duplicates from column_name
:
DELETE FROM table_name
WHERE ROWID NOT IN (
SELECT MIN(ROWID)
FROM table_name
GROUP BY column_name
);
In this query:
- The
SELECT MIN(ROWID)
gets the first row for each distinct value ofcolumn_name
. - The
DELETE
statement removes all rows that don't have the minimumROWID
for eachcolumn_name
, effectively keeping only one occurrence of each duplicate.
Using ROW_NUMBER()
(Recommended for Complex Scenarios)
For more complex scenarios, you can use the ROW_NUMBER()
window function to assign a unique number to each row within a partition of duplicate values, and then delete rows where the ROW_NUMBER()
is greater than 1.
Here's an example that removes duplicates based on a specific column (e.g., column_name
):
WITH cte AS (
SELECT ROWID, column_name,
ROW_NUMBER() OVER (PARTITION BY column_name ORDER BY ROWID) AS rn
FROM table_name
)
DELETE FROM table_name
WHERE ROWID IN (
SELECT ROWID
FROM cte
WHERE rn > 1
);
In this query:
- The
ROW_NUMBER()
function assigns a unique number (starting from 1) to each row within the same value ofcolumn_name
. - The
PARTITION BY column_name
means that the row numbers are reset for each distinct value ofcolumn_name
. - The
DELETE
statement removes all rows where theROW_NUMBER
is greater than 1, effectively keeping only the first occurrence of each duplicate value.
Handling Tables with Primary Keys
If your table has a primary key, and you're only concerned with duplicates in one or more non-primary key columns, you can adjust your approach accordingly. If you have constraints or indexes, it’s important to consider these when performing deletion to ensure no data integrity issues arise.
Summary:
- To display unique values: Use
SELECT DISTINCT
. - To remove duplicates:
- Use
ROWID
for simpler cases. - Use
ROW_NUMBER()
for more complex cases where you want more control over which duplicates are kept or deleted.
- Use
Make sure to back up your data before performing any delete operations to prevent accidental loss of data.
No comments:
Post a Comment