23

I have a form (Rails) which allows me to load a .csv file using the file_field. In the view:

    <% form_for(:upcsv, :html => {:multipart => true}) do |f| %>
    <table>
        <tr>
            <td><%= f.label("File:") %></td>
            <td><%= f.file_field(:filename) %></td>
        </tr>
    </table>
        <%= f.submit("Submit") %>
    <% end %>

Clicking Submit redirects me to another page (create.html.erb). The file was loaded fine, and I was able to read the contents just fine in this second page. I am trying to show the number of lines in the .csv file in this second page.

My controller (semi-pseudocode):

class UpcsvController < ApplicationController
    def index
    end

    def create
        file = params[:upcsv][:filename]
        ...
        #params[:upcsv][:file_length] = file.length # Show number of lines in the file
        #params[:upcsv][:file_length] = file.size
        ...
    end
end

Both file.length and file.size returns '91' when my file only contains 7 lines. From the Rails documentation that I read, once the Submit button is clicked, Rails creates a temp file of the uploaded file, and the params[:upcsv][:filename] contains the contents of the temp/uploaded file and not the path to the file. And I don't know how to extract the number of lines in my original file. What is the correct way to get the number of lines in the file?

My create.html.erb:

<table>
    <tr>
        <td>File length:</td>
        <td><%= params[:upcsv][:file_length] %></td>
    </tr>
</table>

I'm really new at Rails (just started last week), so please bear with my stupid questions.

Thank you!

Update: apparently that number '91' is the number of individual characters (including carriage return) in my file. Each line in my file has 12 digits + 1 newline = 13. 91/13 = 7.

Mathias
  • 233
  • 1
  • 2
  • 5
  • Be real careful allowing a file to be uploaded without some tests on filesize. Imagine the problems if the file uses all the diskspace on your drive. Or, if the file is many gigabytes of carriage-returns, and your code in Rails is spinning trying to read and count the lines, DOSing your host. If you are on Linux you might want to have the OS's `wc` command do the lifting for you as it can return the line-count and number of characters in the file very quickly, without Rails having to open and read it. – the Tin Man Jan 14 '11 at 01:37

7 Answers7

24

All of the solutions listed here actually load the entire file into memory in order to get the number of lines. If you're on a Unix-based system a much faster, easier and memory-efficient solution is:

`wc -l #{your_file_path}`.to_i
ghoppe
  • 21,452
  • 3
  • 30
  • 21
Jaco Pretorius
  • 24,380
  • 11
  • 62
  • 94
23

.length and .size are actually synonyms. to get the rowcount of the csv file you have to actually parse it. simply counting the newlines in the file won't work, because string fields in a csv can actually have linebreaks. a simple way to get the linecount would be:

CSV.read(params[:upcsv][:filename]).length
roman
  • 11,143
  • 1
  • 31
  • 42
  • Thanks, guys! Alas, now I'm getting "can't convert Tempfile into String". This is the Request parameter: {"commit"=>"Submit","authenticity_token"=>"<-removed->","upcsv"=>{"filename"=>#}} Is there any way that I can evaluate the actual .csv file rather than this Tempfile? – Mathias Jan 11 '11 at 21:05
19

another way to read the number of lines is

file.readlines.size
gicappa
  • 4,862
  • 1
  • 19
  • 23
  • 1
    Hey, that actually works! However, Rails deleted the Tempfile after I run that line so I can't process the contents of the file...weird behavior. Thank you! – Mathias Jan 11 '11 at 23:36
  • 1
    @Mathias, are you sure that the Tempfile is deleted? I suspect you just need to rewind (`file.seek(0)`) – cam Jan 14 '11 at 00:31
  • @cam, I suspect the file was deleted after I do any read on it, since if I added some kind of line-counting codes before my main code (to process the data), my main code fails (although now I forgot what the error was). I'm pretty sure my line-counting code was not destructive by itself. So I suspect it's just the way rails work with tempfiles. But I do find that somewhat strange... – Mathias Jan 21 '11 at 05:50
  • 1
    I actually had the same problem as @Mathias just added `file.seek(0)` between the `file.readlines.size` and the main code. That did the trick :D – Zero Dragon Aug 10 '12 at 00:05
  • just in case anyone else need this, you can get the file object by using: file = open("/yourpath/file.csv") – user1051849 Oct 25 '16 at 10:47
  • 2
    A row in a CSV can contain newlines, you need to actually parse it. – Joshua Cheek Dec 13 '17 at 19:34
12
CSV.foreach(file_path, headers: true).count

Above will exclue header while counting rows

CSV.read(file_path).count
Taimoor Changaiz
  • 10,250
  • 4
  • 49
  • 53
4

your_csv.count should do the trick.

jamesdlivesinatree
  • 1,016
  • 3
  • 11
  • 36
2

If your csv file doesn't fit to memory (can't use readlines), you can do:

def self.line_count(f)
  i = 0
  CSV.foreach(f) {|_| i += 1}
  i
end

Unlike wc -l this counts actual record count, not number of lines. These can be different if there are new lines in field values.

pcv
  • 2,121
  • 21
  • 25
0

Just to demonstrate what IO#readlines does:

if you had a file like this: "asdflkjasdlkfjsdakf\n asdfjljdaslkdfjlsadjfasdflkj\n asldfjksdjfa\n"

in rails you'd do, say:

file = File.open(File.join(Rails.root, 'lib', 'file.json'))
lines_ary = IO.readlines(file)
lines_ary.count #=> 3

IO#readlines converts a file into an array of strings using the \n (newlines) as separators, much like commas so often do, so it's basically like

str.split(/\n/)

In fact, if you did

 x = file.read

this

 x.split(/\n/)

would do the same thing as file.readlines

** IO#readlines can be really handy when dealing with files which have a repeating line structure ("child_id", "parent_ary", "child_id", "parent_ary",...) etc

boulder_ruby
  • 38,457
  • 9
  • 79
  • 100
  • ** to do the above in rails, something like this (" config.autoload_paths += Dir["#{config.root}/lib/**/"]") must be added to config/application.rb – boulder_ruby Aug 10 '12 at 03:48