0

This code compare two files and when the find the same lines, it'll write the line another text file as output.

I guess its time complexity is O(n^2). It takes too much time when the increase the lines.

I think that using Hash could be more effective.

How can I apply for the following code?

Thanks.

fin = open('x.csv')
file1 = open("y.txt","r")
file_output = open("z.txt","w")

lines = file1.readlines()
a = []
for line in lines:
     a.append(line.split("\n")[0])




for line in fin:
    id=line.split(',')[0]
    for w in a:
        if w==id:
           file_output.write(line)
Mr.Hyde
  • 3
  • 2

2 Answers2

1

Make a set out of a and then to check for presence of id in a you won't need a loop, you'll need just id in set_a.

algrid
  • 5,600
  • 3
  • 34
  • 37
0

A dict/mapping object maps hashable values(keys) to arbitrary objects. By default all the primitive immutable objects in python have this hash function implemented. For custom objects you have to implement them for correct functionality. Refer - Object of custom type as dictionary key for more details.

Internals of 'hashtable/hashmap/dictionary/dict' DataStructure:

The important aspect of 'dict' datastructure is - it provides complexity of 'dictionary lookups' in O(1). While others need atleast O(log n) or O(n) for lookup.

For providing this O(1) complexity, it requires the key objects to provide a "hash" function.

  1. This hash function takes the information in a key object and uses it to produce an integer, called a "hash value".
  2. This hash value is then used to determine which "bucket" this (key, value) pair should be placed. Then, go to the bucket and get the value directly or by traversing the list to get correct value for given key.

Following code should run in O(n), where n is the no.of lines in your 2nd File.

fin = open('x.csv')
file1 = open("y.txt","r")
file_output = open("z.txt","w")

lines = file1.readlines()
a = {}
for line in lines:
    key = line.split("\n")[0]
    a[key] = 1

for line in fin.readlines():
    id=line.split(',')[0]
    if id in a:
       file_output.write(line)
arunk2
  • 2,246
  • 3
  • 23
  • 35