I'm trying to get only the first 100 rows of a csv.gz file that has over 4 million rows in Python. I also want information on the # of columns and the headers of each. How can I do this?
I looked at python: read lines from compressed text files to figure out how to open the file but I'm struggling to figure out how to actually print the first 100 rows and get some metadata on the information in the columns.
I found this Read first N lines of a file in python but not sure how to marry this to opening the csv.gz file and reading it without saving an uncompressed csv file.
I have written this code:
import gzip
import csv
import json
import pandas as pd
df = pd.read_csv('google-us-data.csv.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)
for i in range (100):
print df.next()
I'm new to Python and I don't understand the results. I'm sure my code is wrong and I've been trying to debug it but I don't know which documentation to look at.
I get these results (and it keeps going down the console - this is an excerpt):
Skipping line 63: expected 3 fields, saw 7
Skipping line 64: expected 3 fields, saw 7
Skipping line 65: expected 3 fields, saw 7
Skipping line 66: expected 3 fields, saw 7
Skipping line 67: expected 3 fields, saw 7
Skipping line 68: expected 3 fields, saw 7
Skipping line 69: expected 3 fields, saw 7
Skipping line 70: expected 3 fields, saw 7
Skipping line 71: expected 3 fields, saw 7
Skipping line 72: expected 3 fields, saw 7