This page describes how to remove duplicate rows in Excel, using three different methods.
If you want to remove duplicate cells (rather than entire rows of data), you may find the Remove Duplicate Cells page more straightforward.
In order to illustrate how to remove duplicate rows in an Excel spreadsheet, we will use the example spreadsheet below, which has data spanning three columns.
We first show how to use Excel's Remove Duplicates Command to remove duplicate rows and then we show how to use use Excel's Advanced Filter to perform the same task. Finally, we show how to remove duplicate rows using Excel Formulas.
Note that the methods described keep the first occurrence of each row, but delete any subsequent duplicate rows.
The Remove Duplicates command is located in the 'Data Tools' group, within the Data tab of the Excel ribbon.
To remove duplicate rows using this command:
You will be presented with the Remove Duplicates dialog box, as shown below:
This dialog box allows you to select which columns of your data set you want to be included in the comparison for duplicate information. In the example spreadsheet above, we only want a record to be removed if the contents of all three columns contain duplicate information. Therefore we leave all three fields selected within the dialog box.
Once you have ensured that the required fields are checked in the dialog box, click OK.
Excel will then delete the duplicate rows, as required and will present you with a message, informing you of the number of records removed and the number of unique records remaining (see below).
The resulting example spreadsheet is shown on the rightabove. As required, the duplicate row 7 (for Laura CARTER, id: #31032) has been removed.
The Excel advanced filter has an option that allows you to filter unique records (rows of data) in a spreadsheet and copy the resulting filtered list to a new location.
This gives you a data set that contains the first occurrence of a duplicated row, but does not contain any further occurrences.
To remove duplicate rows using the Advanced Filter:
Select the data that you want to remove the duplicates from (columns A-C in the example spreadsheet above);
(Alternatively, if you select any cell within the data set, Excel will automatically select the entire range of data when you activate the advanced filter).Select the Excel Advanced Filter option from the Data tab at the top of your Excel workbook
(or in Excel 2003, this option is located in Data→Filter menu).You will be presented with a dialog box showing you the options for the Excel advanced filter (see below).
Within this dialog box:
In the Copy to field, enter the location that you want to copy the new list to.
Note that this location must be in the current worksheet. In this example, cell E1 of the current Worksheet "Sheet1" has been selected as the 'copy to' location;The resulting spreadsheet, with the new data list in column E, is shown below:
It can be seen that the duplicate row 7 (for Laura CARTER, id: #31032) has been removed from the data in columns E-G.
If you want to, you can now delete the columns to the left of your new data list (columns A-D in the example spreadsheet) to return to the original spreadsheet format.
Warning: This method will only work if the contents of your cells are less than 256 characters in length, as Excel functions cannot handle text strings that are longer than this.
In order to illustrate how to use Excel formulas to remove duplicate rows in an Excel spreadsheet, we will again use the simple example spreadsheet (repeated on the rightabove), that contains the personal data (forename, surname and ID Number) of nine individuals.
The first step of removing the duplicate rows is to combine the contents of the columns A-C into a single column. We will then highlight the rows corresponding to the duplicate values, before deleting these rows.
We first combine the data from columns A-C of the example spreadsheet, using the concatenation & operator in column D. The formula to be entered into cell D2 is:
Copying this formula down all rows gives the following spreadsheet:
Once the contents of columns A-C have been concatenated into column D, we need to find the duplicates in the combined column D.
This can be done using the Countif function, as shown in column E of the spreadsheet below. This function shows the number of occurrences of each value in column D, up to the current row only.
As shown in the formula bar of the above spreadsheet, the format of the Countif function in cell E2 is:
Note that this function uses a combination of Absolute and Relative Cell References. Due to this combination of reference styles, as the formula is copied down column E, it becomes,
=COUNTIF( D$2:D3, D3 ) =COUNTIF( D$2:D4, D4 ) =COUNTIF( D$2:D5, D5 ) etc. |
Therefore, the formula in cell E4 returns the value 1 for the first occurrence of the text string "LauraCARTER#31032", but the formula in cell E7 returns the value 2 for the second occurence of this text string.
Once we have used the Countif function to highlight the duplicates in column D of the example spreadsheet, we need to delete the rows for which the count is greater than 1.
In the simple example spreadsheet, it is easy to see, and to delete, the single duplicate row. However, if you have several duplicates, you might find it faster to delete all duplicates at once, using the Excel Autofilter.
The following steps show how to remove several duplicate rows at once, (after they have been highlighted using the Countif function):
Select the column containing the Countif function (column E in the example spreadsheet above);
(Alternatively, if you select any cell within the current data set, Excel will automatically select the entire range of data when you activate the autofilter).Use the filter at the top of column E to select rows that are not equal to 1.
I.e. click on the filter and, from the list of values, uncheck the value 1;You will be left with a spreadsheet in which the first occurrence of every row is hidden. I.e. only the duplicate rows are displayed.
You can delete these rows by highlighting them, then right clicking with the mouse and selecting Delete Rows.
Remove the filter and you will be left with the spreadsheet shown below, in which the duplicate row 7 has been removed.
You can now delete the columns containing your formulas (columns D and E in the example spreadsheet), to return to the original spreadsheet format.