I am trying to parse through a section of a huge dataset. The portion of the dataset that I have is a 3GB gzip file. The file is structured so it has x columns and millions of rows. The columns are separated by commas or some sort of common operator so I can read the file.
What I want to do is based on 2 ranges (i.e. value a < col1 < value b, value c < col2 < value d), check the values of two columns for each row of the dataset. If both of the values are within the range of the dataset, move the entire row of data to a new file(? not sure exactly what to store it in) and then return that new subset.
What I am missing is a fundamental understanding of how to handle iteration like this. I am struggling with what to do with the set after I have used the pandas read_csv function in order to filter the dataset. I think I should be using data frames to access the data I am looking for but I am not sure.