3

In R, I have switched to using vroom due to its speed at reading in large delimited files, but I cannot find a simple way to pre-filter large datasets as I could do with say the sqldf package or through using SQLite and dplyr as described here

The Vroom documentation suggests using awk to pre-filter CSVs, but I am wondering if there is an easier way to do this that ideally lets you write in the dplyr language.

kam
  • 31
  • 1
  • 1
    I don't understand; what's wrong with [reading from pipe connections](https://www.tidyverse.org/blog/2019/05/vroom-1-0-0/#reading-and-writing-from-pipe-connections)? Is there a specific task that you want to perform (i.e. select rows where column 3 > column 4) that can't be done with gnu utils? – jared_mamrot Mar 11 '22 at 02:43
  • @jared_mamrot there's nothing wrong with it per se, i just wanted to make sure there were no easier ways to do this, something better integrated with the language of dplyr, as opposed to "outsourcing" the filtering to a different language. if i'm using dplyr filtering post-import other places in the code, then its nice to also use it pre-import. dplyr also just seems easier for more complicated filtering – kam Mar 11 '22 at 03:27
  • That makes sense - thanks for clarifying. I don't know any other way but hopefully another user can help you – jared_mamrot Mar 11 '22 at 03:29
  • 1
    There are quite a few command line utilities that specifically understand csv files and don't require programming a full language to use. csvfix, xsv, csvkit, miller and csvtk are a few. – G. Grothendieck Mar 12 '22 at 17:09

0 Answers0