0

I used the following questions as sources, listed along with the code I am asking about:


  1. How to get line count cheaply in Python?

    def file_len(fname):
        with open(fname) as f:
            for i, l in enumerate(f):
                pass
        return i + 1
    
    fname = '/path/file.xls'
    file_len(fname)
    

This returned 40, when the actual number of lines is 56.


  1. How to count rows in multiple csv file

    import glob
    import pandas as pd
    
    files = glob.glob('files/*.xls')
    
    d = {f: sum(1 for line in open(f)) for f in files}
    

This returned 40 for the same file as well, and did not return the correct counts for other files in that path either.


It is not returning an error, and returns the same count consistently across methods, however it is not the correct number of lines.

My question is : What is actually being counted in the .xls files, since it is not the number of lines?

jhurst5
  • 67
  • 1
  • 10
  • 1
    Define number of lines, do you only have 40 lines that has data and just blank lines inbetween? Can you show what your file looks like ? – MooingRawr Feb 23 '18 at 20:36
  • How do you figure the "actual number of lines" is 56? – juanpa.arrivillaga Feb 23 '18 at 20:38
  • 2
    Related: https://stackoverflow.com/questions/15541641/how-to-get-line-number-in-excel-sheet-using-python – Arnav Borborah Feb 23 '18 at 20:39
  • There are 56 rows in the spreadsheet, 9 columns, and no blank cells. None of the other files have blank space either. – jhurst5 Feb 23 '18 at 20:41
  • @juanpa.arrivillaga What I mean by 'actual number' is when I open the file in open office, there are 56 rows. My question is where does the 40 come from? – jhurst5 Feb 23 '18 at 20:47

1 Answers1

3

xls is not a text format, it's binary. There are apparently a number of bytes in the file that are equivalent to the ASCII newline character, but that will have no relation to how many rows there are in the spreadsheet.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895