When working with lists and data, in general, you will undoubtedly encounter the situation you’ll find duplicates in a list in Python or any other programming language you’ll need to handle that data in. If you handle that data with Python, though, you’ll have a lot of functions and methods to help you out in this coding chore.
Using set
– If List has Only Duplicate Values
You can sort data inside a list and look for duplicates using a set
. Sets are unordered and unindexed collections of data and are commonly used for sorting data because of these exact characteristics.
Also, set
have numerous methods that can help you with this task, so let’s see an example:
## using sets to find duplicates in a list with only duplicate values
# list containing data
lst1 = [3, 1, 5, 1, 10, 3, 5, 10]
# create an empty set
duplicates = set()
# loop through elements and find matches
for i in lst1:
if i not in duplicates:
duplicates.add(i)
# show data
print(lst1)
print(duplicates)
As can be seen from the code above, we create a list called lst1
containing various numbers and an empty set
called duplicates, which we’ll use to store the duplicate values. To find the duplicates in list1
we’ll make sure we copy the values only once inside the duplicates set with the help of the not in
instruction. At last, we show the results in the console:
[3, 1, 5, 2, 1, 10, 3, 5, 10]
{1, 3, 5, 10}
After compilation, the code I was talking about earlier will produce the result seen in the console above: the list will be printed in the console and, on line two, the set
with all the duplicate values inside.
Using the count()
method to Print Duplicates
The example given earlier will work well when the list contains only duplicate values but, if there are also unique values inside the list, those values will appear in the set
as well, which is wrong. The console below will show that case:
[3, 1, 5, 2, 1, 10, 3, 5, 10, 11, 12]
{1, 2, 3, 5, 10, 11, 12}
To counter that, we will have to check the list for duplicate values, and only when we find them will we start adding those values to the final list. The code below will accomplish precisely that:
## finding duplicate values in a list
lst2 = [3, 1, 5, 2, 1, 10, 3, 5, 10, 11, 12]
dupl = set() # create empty set
# loop trough the elements inside the list
for i in lst2:
if lst2.count(i) > 1:
dupl.add(i)
# show final data
print(lst2)
print(dupl)
It can be seen from the code above how we create another list called lst2
and another set
, which will hold the duplicate entries called dupl
. So far, so good, the code is very similar to the previous example. The major difference will be in the loop
that cycles through each element of the list we called lst2
.
While we loop through the elements, we check each element for duplicates using the count()
method. The count method takes a parameter given by the programmer (in our case, given automatically from the list) and checks if there are occurrences of that element in the list and, if the case, how many occurrences how been found. The method will return the number of occurrences so:
- 0 if no matches were found;
- 1 if a match has been found;
- N, where N can hold the value of the number of occurrences found inside the list.
In our case, we just need to know if there are duplicate values inside the list, so any variable that returns a count
bigger than 1 will be stored inside the final set
. The output can be seen below:
[3, 1, 5, 2, 1, 10, 3, 5, 10, 11, 12]
{1, 10, 3, 5}
As can be seen from the console above, our program returns the duplicate values from the list and only the duplicate values. Now, that doesn’t mean that the code presented in the previous chapter is wrong, it works perfectly in the preceding example, in fact, it will work better than this code for the list (with only duplicate values) in the last example because it will be slightly faster. This is the trick with programming, you have to choose the best solution for each problem, and that’s why we love it so much.
Using the Counter
class to Check Repeated Elements
The Counter
class from the collections
library was designed specifically for these types of tasks. The usage of such a class can reduce the lines of code needed for this task to a minimum as the code below will show:
## using the counter class from the collections library
from collections import Counter
lst3 = [2, 1, 3, 5, 1, 2, 3]
ldupl = [i for i, cnt in Counter(lst3).items() if cnt > 1]
# show data in the console
print(lst3)
print(ldupl)
The only things that pop up in this scrap of code are the line that imports
the Counter
class from the collections library and the declaration of the list called ldupl
containing the duplicate values inside lst3
.
In Python, you can actually sort data inside the constructor
of a new list, which is exactly what we’re doing here. We use the Counter()
function to get all items in the lst3
list and test if those elements are present in the list more than once, if yes, we add them to the final list called ldupl
.
After printing the contents of the initial list and the sorted list, we will get the output shown below:
[2, 1, 3, 5, 1, 2, 3]
[2, 1, 3]
sort()
and Detect Duplicate Elements
I’ve saved the simplest method for last. This is probably the most intuitive method to sort a list of elements and retrieve or identify the duplicates. The code below will demonstrate just that:
## finding duplicates using the sort method
lst4 = [2, 1, 3, 5, 1, 2, 3, 10, 5, 12]
lstdupl = []
lst4.sort() # sorting the list
# iterate trough all elements inside the list
for i,j in enumerate(lst4):
if (i>0) and (lst4[i] == lst4[i-1]):
lstdupl.append(j)
print(lst4)
print(lstdupl)
We can observe from the code above that, as before, we’ve created two lists called lst4
and lstdupl
, the first to hold the data we will be sorting and the latter to hold the duplicates after the program looped trough. We sort the list ascending
using the sort()
method and iterate through all elements in the list.
You can notice the use of the enumerate()
function in the for loop
. I have used the enumerate()
function because it returns both the values inside the list
and the position of that value, which is useful in the next step.
We proceed to check each element in the list for duplicates and, because the list is sorted, the elements will be next to each other, so we check the current element against the previous one.
Of course, that check would retrieve an error if we are located at the first position in the list or position 0
, so we add a condition for our check to start from position 1
, so the second element in the list. If we find a match, we’ll add that element to the final list called lstdupl
. We then show the results in the console:
[1, 1, 2, 2, 3, 3, 5, 5, 10, 12]
[1, 2, 3, 5]
As you can see in the console above, the program outputs the contents of the sorted list lst4
and the duplicate elements from the list mentioned above inside lstdupl
list. Job done.