0

I am working with a very big file in python. I need to check whether a particular bigram is present in that file. I have written the code. It gives the correct output but is too slow. Is there any other alternative ?

def check(word1, word2):
    with open("D:\bigram.txt", 'r') as file:
       #bigram_list2=[]
       for line in file:
          phrase=word1 + " " + word2
          if phrase in line:
             return 1
   return -1
N Agarwal
  • 13
  • 3

1 Answers1

0

Import the entire file into RAM (if you got enough)

import mmap

def check(word1, word2):
    with open('D:\bigram.txt', 'rb') as f:
        # Size 0 will read the ENTIRE file into memory!
        m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) #File is open read-only

        # Proceed with your code here -- note the file is already in memory
        # so "readine" here will be as fast as could be
        data = m.readline()
        bigram_list2=[]
        while data:
            data = m.readline()
            phrase=word1 + " " + word2
            if phrase in line:
                return 1
    return -1

For more information checkout:

Fastest Text search method in a large text file

Python load 2GB of text file to memory

Hope this helps!

George K
  • 481
  • 2
  • 13