2

I have created a csv file of key-value pairs to contain curves that may be used in a model that I am building. This uses the following structure:

    Curve Name  |   Time Step   |   Value   
--------------------------------------------
    RPI         |   0           |   1
    RPI         |   1           |   1.012
    RPI         |   2           |   1.019
    RPI         |   .           |   .
    RPI         |   .           |   .
    RPI         |   .           |   .
    RPI         |   720         |   1.341
    LIBOR       |   0           |   1
    LIBOR       |   1           |   1.012
    LIBOR       |   2           |   1.019
    LIBOR       |   .           |   .
    LIBOR       |   .           |   .
    LIBOR       |   .           |   .
    LIBOR       |   720         |   1.341
    .           |   .           |   .
    .           |   .           |   .
    .           |   .           |   .

It should be easy to see how this table could come to have a huge number of rows in. Since my curves are defined at 721 time points, I would have 721,000 rows of data in my csv if it contained 1,000 curves.

Furthermore, it may only be a small number of the curves in this csv that I need to use in my model. This being the case, is there a way to read part of this csv file into an array or dataframe without reading the entire contents (by filtering on the 'Curve Name' field) into an array first?

I ask because I am assuming that as this file becomes very large, it will become expensive to read it into memory. Correct me if I am mistaken in thinking this.

  • 1
    You are not showing a CSV -- just to be sure, your data is like: `"RPI",1,1.012` – Lou Franco May 28 '20 at 19:19
  • Is this what your file actually looks like? This doesn't look like any variant of CSV. – user2357112 May 28 '20 at 19:22
  • You don't have to read it all in at once, but you probably have enough memory to do so. There is no convenient filtering for CSV -- if the data is sorted by curve name, you could stream it in and then bail when you find your curve. If you are processing it a lot, you could make an index of file indexes of where curves start – Lou Franco May 28 '20 at 19:24
  • @J R Chapman: Try to read line by line and use the rows that fit your needs. – Maurice Meyer May 28 '20 at 19:24
  • 1
    Sorry I reopened your question I noticed just now you didn't specify pandas. Don't know why I assumed that. This is a solution for pandas: https://stackoverflow.com/questions/13651117/how-can-i-filter-lines-on-load-in-pandas-read-csv-function – orlp May 28 '20 at 19:28
  • Apologies for any confusion, the data shown in the above was just intended to demonstrate the structure of my data. – J R Chapman May 28 '20 at 21:11
  • @orlp Pandas could be used if it were to offer an efficient solution, but I'd be interested in any approaches that perform well. – J R Chapman May 28 '20 at 21:14

0 Answers0