I have created a csv file of key-value pairs to contain curves that may be used in a model that I am building. This uses the following structure:
Curve Name | Time Step | Value
--------------------------------------------
RPI | 0 | 1
RPI | 1 | 1.012
RPI | 2 | 1.019
RPI | . | .
RPI | . | .
RPI | . | .
RPI | 720 | 1.341
LIBOR | 0 | 1
LIBOR | 1 | 1.012
LIBOR | 2 | 1.019
LIBOR | . | .
LIBOR | . | .
LIBOR | . | .
LIBOR | 720 | 1.341
. | . | .
. | . | .
. | . | .
It should be easy to see how this table could come to have a huge number of rows in. Since my curves are defined at 721 time points, I would have 721,000 rows of data in my csv if it contained 1,000 curves.
Furthermore, it may only be a small number of the curves in this csv that I need to use in my model. This being the case, is there a way to read part of this csv file into an array or dataframe without reading the entire contents (by filtering on the 'Curve Name' field) into an array first?
I ask because I am assuming that as this file becomes very large, it will become expensive to read it into memory. Correct me if I am mistaken in thinking this.