1

I have a folder /data and this folder has around 50,000 r datasets. Each dataset starting with File_. It goes like this.

   File_1323.rds
   File_3223.rds
   File_5122.rds
   File_8273.rds
   .
   .
   .

I need help counting the total number of rows in all these files. Not one file at a time. Total number of rows from all these files combined. Any suggestions much appreciated. Thanks in advance.

Sundown Brownbear
  • 491
  • 1
  • 5
  • 15
  • 1
    `cat File_*.rds | wc -l` should work, assuming each line of a file is one "row". Edit: Scratch that. I'm pretty sure 50,000 files would exceed the maximum number of args. So you may need to use `find -name "File_*.rds" -exec cat {} \;` instead of `cat File_*`. And if you don't want to recurse into subdirs, then add `-maxdepth 1` after `find` – Mike Holt Mar 21 '19 at 20:04
  • 2
    Possible duplicate of [Total number of lines in a directory](https://stackoverflow.com/questions/47026533/total-number-of-lines-in-a-directory), [How to count all the lines of code in a directory recursively?](https://stackoverflow.com/questions/1358540/how-to-count-all-the-lines-of-code-in-a-directory-recursively) – chickity china chinese chicken Mar 21 '19 at 20:07
  • It is hard to say what is wrong with your code because you did not provide it or the errors you encountered. Also see [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve). – jww Mar 22 '19 at 02:33

1 Answers1

4

Use wc(1):

wc -l File_*.rds

Or, to get just the number (for usage in scripts):

wc -l File_*.rds | awk '/ total/{ print $1 }'

However 50'000 might exceed the "maximum number of arguments" limit, so use find ... -exec...:

find . -name 'File_*.rds' -exec cat "{}" + | wc -l
mhutter
  • 2,800
  • 22
  • 30