I am trying to read a very large text file (.dat file) - too large to open in full - but I am not interested in the full content (only a minor part).
The real text file has 32 columns and an unknown number of rows. I am only interested in the rows where the value in column nr 15 is less than 40 and the value in column 21 is between 10 - 25. Is there a way to specify these constraints when opening and loading the file in python so that I avoid wasting memory on the content I don't care about?
Here is an example:
Let's say we have a file of 100 data (25 rows and 4 columns) and we only want to read the rows (the full rows!) where the value in column nr 2 is less than 40 and the value in column nr 4 is between 10 - 25. How can we do this without first loading the full file?
import numpy as np
# Create some examplary fake data for the text file:
textfile_content = np.random.randint(100, size=100).reshape(25,4)
print(textfile_content)
# Save text file:
np.savetxt('file.dat', textfile_content, fmt='%10.5e')
The closest approach I have been able to google is this: Reading specific lines only (Python)
But it doesn't quite solve my problem, as this person wants to extract full columns, predefined by their column number and not a specific data value.