Python: Do not remove the data when I stop the program. Loading a very big database only once

Question

So, I have this database with thousands of rows and columns. At the start of the program I load the data and assign a variable to it:

data=np.loadtxt('database1.txt',delimiter=',')

Since this database contains many elements, it takes minutes to start the program. Is there a way in Python (similar to .mat files in matlab) which makes me only load the data once even when I stop the program then run it again? Currenly my time is wasted waiting for the program to load the data if I just change a small thing for testing.

score 0 · Answer 1 · answered Dec 09 '19 at 23:56

0

Firstly, the Numpy package isn't good to read a large file, the Pandas package it's so strongly.
So just stop using np.loadtxt and start using pd.read_csv instead.
But, if you want to use it
I think that the np.fromfile() module is more efficient and faster than np.loadtxt().
So, my advice try:

data = np.fromfile('database1.txt', sep=',')

instead of:

data = np.loadtxt('database1.txt',delimiter=',')

answered Dec 09 '19 at 23:56

Mahrez BenHamad

1,791
1
15
21

Thanks, I use numpy to load because the result is a numpy array. Do you know how make pandas produce numpy array instead? – Forenkazan1 Dec 10 '19 at 00:23
take a look at this: https://stackoverflow.com/a/51308247/6808714 – Mahrez BenHamad Dec 10 '19 at 19:52

giuliano-oliveira · Answer 2 · 2019-12-10T14:25:59.983

You could pickle to cache your data.

import pickle
import os
import numpy as np
if os.path.isfile("cache.p"):
     with open("cache.p","rb") as f:
        data=pickle.load(f)
else:
    data=data=np.loadtxt('database1.txt',delimiter=',')
    with open("cache.p","wb") as f:
        pickle.dump(data,f)

The first time it will be very slow, then in later executions it will be pretty fast.

just tested with a file containing 1 million rows and 20 columns of random floats, it took ~30s the first time, and ~0.4s the following times.

Python: Do not remove the data when I stop the program. Loading a very big database only once

2 Answers2