I have a 6.6 GB (43 million row) .txt file. Inside the file is about 20 columns of data.
I have the same data stored in a DB table and I want to do simple spot-check comparisons, like row count, null count, distinct count etc between the 2. I have done this kind of thing before in Pandas, but never with a dataset this large. I am trying to figure out how to read in that .txt file, or if I even need to read it in entirely to do the above analysis.
Clearly this won't work, as it will just run indefinitely:
data = pd.read_csv('huge_file.txt', sep=" ", header=0)
Any suggestions?