How to count number of rows of csv file within zip file

Question

Trying to save disc space by loading a csv file into R directly from zip using fread(). Just wondering if there's a way to get something akin to nrow() or dim() from the csv (within the zip) before loading in order to get an idea of how large the object will be and to avoid running out of available ram. Any suggestions? If there's a better way to determine if the csv will be too large when uncompressed and loaded into R, that would also be good to know. Thanks (p.s. using Windows 10).

https://www.r-bloggers.com/easy-way-of-determining-number-of-linesrecords-in-a-given-large-file-using-r/ — LocoGris, Feb 14 '19 at 16:55
You could also run `unzip -l ` in CMD, which lists the contained files, along with the total uncompressed size. — Mako212, Feb 14 '19 at 16:56
Essentially `shell(shQuote(sprintf("unzip -l %s", file.choose()))` — Mako212, Feb 14 '19 at 16:58
Possible duplicate of [Extract bz2 file in R](https://stackoverflow.com/questions/25948777/extract-bz2-file-in-r) — krads, Mar 16 '19 at 11:21
This isn't a duplicate of that question, because macsmith is asking how to efficiently just do a size/row count. That question only explains how to directly read & interact with the data. — Barett, Mar 16 '19 at 22:59

score 0 · Answer 1 · answered Dec 25 '20 at 21:59

A very good alternative especially in regard to reading zipped files fast is vroom:

https://vroom.r-lib.org: "... it simply indexes where each record is located so it can be read later." So it should be safe to load very big datasets without risking running into lockouts.

require(vroom)

vroom("./data.csv.gz")
# indexed 0B in  0s, 0B/sindexed 1.00TB in  0s, 1.25PB/sRows: 200                 
# Columns: 6
# Delimiter: ","
# chr [6]: Column1, Date, Column2, Subtable_Column1, Subtable_Column2, Subtable_Column3
# 
#
# Use `spec()` to retrieve the guessed column specification
# Pass a specification to the `col_types` argument to quiet this message
# A tibble: 200 x 6
... <data> ...

How to count number of rows of csv file within zip file

1 Answers1