I am trying to use the freebase data dump, but it seams that I have some problems reading the files with python. It looks like that my program cant read all the lines.
def test2():
count=0
for line in open(FREEBASE_TOPIC):
count+=1
return count
def test3():
count=0
for line in open(FREEBASE_QUAD):
count+=1
return count
if __name__ == "__main__":
print "FREEBASE TOPIC - NR LINES:",test2()
print "FREEBASE QUAD - NR LINES:",test3()
Results in this:
FREEBASE TOPIC - ITR TIME: 1.21000003815
FREEBASE TOPIC - NR LINES: 1643010
FREEBASE QUAD - ITER TIME: 0.797000169754
FREEBASE QUAD - NR LINES: 3155131
This can be all. It looks to be to few lines to contain the whole freebase. And I cant see how it is possible to iterate over one 33GB file and another 5GB file in 2 seconds.
What is wrong? I am downloading the files again in case something went wrong during the download process, but it takes decades with my connections, so I am asking ere in the mean time. The file size is correct, and i have printed some of the lines and they look correct.