Recently in an interview I was asked to write a python code to remove all duplicate entries from a list.
For example :
Input List = {1,2,4,5,2,3,1}
Expected Output List = {4,5,3}
In the above example, 1's & 2's appear more than once and they should be removed. And order preservation is important. This was the question asked.
Again he did not want me to use inbuilt functions like set(), unique() etc. He was testing my algorithm and ds skills, i guess. He had made this clear in the start.
I thought of 2 approaches on the spot. 1.) Sorting (complexity of nlog(n)) 2.) HashTable (faster that sorting)
HashTable Approach :
arr = [1,2,4,5,2,3,1]
//function : to create a hash table with key = arr[i] & value = occurence count
def dataCountTable(arr):
countTable = {}
i = 0
while i<len(arr) :
if arr[i] in countTable :
countTable[arr[i]] += 1
else :
countTable[arr[i]] = 1
i+=1
return countTable
//function : to remove duplicates using the arr & hash table
def rmvAllDuplicate(arr, countTable):
outList = list()
i = 0
while i<len(arr) :
if countTable[arr[i]] == 1 :
outList.append(arr[i]);
i+=1
return outList
print rmvAllDuplicate(arr, dataCountTable(arr))
The interviewer did not seem impressed with this answer. And it kept me searching for a better aprooach. I could'nt find one.
If someone could help me improve my solution or suggest a new and a better solution, it will be nice!
Thanks!