How can I apply hash algorithm instead of for loops to reduce time complexity in python?

Question

This code compare two files and when the find the same lines, it'll write the line another text file as output.

I guess its time complexity is O(n^2). It takes too much time when the increase the lines.

I think that using Hash could be more effective.

How can I apply for the following code?

Thanks.

fin = open('x.csv')
file1 = open("y.txt","r")
file_output = open("z.txt","w")

lines = file1.readlines()
a = []
for line in lines:
     a.append(line.split("\n")[0])




for line in fin:
    id=line.split(',')[0]
    for w in a:
        if w==id:
           file_output.write(line)

Would you mind explaining what your code is supposed to do, apart from "This code find the lines"? — timgeb, Jan 01 '18 at 22:36
Explanation was added and "found" part was unnecessary and deleted. — Mr.Hyde, Jan 01 '18 at 22:49

score 1 · Answer 1 · answered Jan 01 '18 at 22:46

1

Make a set out of a and then to check for presence of id in a you won't need a loop, you'll need just id in set_a.

answered Jan 01 '18 at 22:46

algrid

5,600
3
34
37

score 0 · Accepted Answer · answered Jan 02 '18 at 06:23

A dict/mapping object maps hashable values(keys) to arbitrary objects. By default all the primitive immutable objects in python have this hash function implemented. For custom objects you have to implement them for correct functionality. Refer - Object of custom type as dictionary key for more details.

Internals of 'hashtable/hashmap/dictionary/dict' DataStructure:

The important aspect of 'dict' datastructure is - it provides complexity of 'dictionary lookups' in O(1). While others need atleast O(log n) or O(n) for lookup.

For providing this O(1) complexity, it requires the key objects to provide a "hash" function.

This hash function takes the information in a key object and uses it to produce an integer, called a "hash value".
This hash value is then used to determine which "bucket" this (key, value) pair should be placed. Then, go to the bucket and get the value directly or by traversing the list to get correct value for given key.

Following code should run in O(n), where n is the no.of lines in your 2nd File.

fin = open('x.csv')
file1 = open("y.txt","r")
file_output = open("z.txt","w")

lines = file1.readlines()
a = {}
for line in lines:
    key = line.split("\n")[0]
    a[key] = 1

for line in fin.readlines():
    id=line.split(',')[0]
    if id in a:
       file_output.write(line)

How can I apply hash algorithm instead of for loops to reduce time complexity in python?

2 Answers2