Is it possible to read data from a CSV file, based on some column lookup, without reading the entire file into memory?

Question

I have created a csv file of key-value pairs to contain curves that may be used in a model that I am building. This uses the following structure:

    Curve Name  |   Time Step   |   Value   
--------------------------------------------
    RPI         |   0           |   1
    RPI         |   1           |   1.012
    RPI         |   2           |   1.019
    RPI         |   .           |   .
    RPI         |   .           |   .
    RPI         |   .           |   .
    RPI         |   720         |   1.341
    LIBOR       |   0           |   1
    LIBOR       |   1           |   1.012
    LIBOR       |   2           |   1.019
    LIBOR       |   .           |   .
    LIBOR       |   .           |   .
    LIBOR       |   .           |   .
    LIBOR       |   720         |   1.341
    .           |   .           |   .
    .           |   .           |   .
    .           |   .           |   .

It should be easy to see how this table could come to have a huge number of rows in. Since my curves are defined at 721 time points, I would have 721,000 rows of data in my csv if it contained 1,000 curves.

Furthermore, it may only be a small number of the curves in this csv that I need to use in my model. This being the case, is there a way to read part of this csv file into an array or dataframe without reading the entire contents (by filtering on the 'Curve Name' field) into an array first?

I ask because I am assuming that as this file becomes very large, it will become expensive to read it into memory. Correct me if I am mistaken in thinking this.

You are not showing a CSV -- just to be sure, your data is like: `"RPI",1,1.012` — Lou Franco, May 28 '20 at 19:19
Is this what your file actually looks like? This doesn't look like any variant of CSV. — user2357112, May 28 '20 at 19:22
You don't have to read it all in at once, but you probably have enough memory to do so. There is no convenient filtering for CSV -- if the data is sorted by curve name, you could stream it in and then bail when you find your curve. If you are processing it a lot, you could make an index of file indexes of where curves start — Lou Franco, May 28 '20 at 19:24
@J R Chapman: Try to read line by line and use the rows that fit your needs. — Maurice Meyer, May 28 '20 at 19:24
Sorry I reopened your question I noticed just now you didn't specify pandas. Don't know why I assumed that. This is a solution for pandas: https://stackoverflow.com/questions/13651117/how-can-i-filter-lines-on-load-in-pandas-read-csv-function — orlp, May 28 '20 at 19:28
Apologies for any confusion, the data shown in the above was just intended to demonstrate the structure of my data. — J R Chapman, May 28 '20 at 21:11
@orlp Pandas could be used if it were to offer an efficient solution, but I'd be interested in any approaches that perform well. — J R Chapman, May 28 '20 at 21:14

Is it possible to read data from a CSV file, based on some column lookup, without reading the entire file into memory?

0 Answers0