-1

I'm trying to write a very basic script that will take a input file name and simply count the number of lines in the file, and print it to CMD. I am getting double the number of lines that are actually in the file when I run it though.

import sys


filename = sys.argv[-1]
with open(filename,) as f:
    LineCount = len(f.readlines())
print(LineCount)
input("Press Enter to close...")

The text file is 208 lines long, I am getting 417 back. Here is what the file looks like. It just repeats from here on out.

Asset Name              In Point            Description 
Zach And Jenv4          00:00:13:11                         
Zach And Jenv4          00:00:14:54                         
Zach And Jenv4          00:00:16:37                         
Zach And Jenv4          00:00:18:20                         
Zach And Jenv4          00:00:20:03                         
Zach And Jenv4          00:00:21:45                         
Zach And Jenv4          00:00:23:28                         
Zach And Jenv4          00:00:25:11                         
Zach And Jenv4          00:00:26:54                         
Zach And Jenv4          00:00:28:36                         
Zach And Jenv4          00:00:30:20                         
Zach And Jenv4          00:00:32:03                         
Zach And Jenv4          00:00:33:45                         
Zach And Jenv4          00:00:35:28                         
Zach And Jenv4          00:00:37:11                         
Zach And Jenv4          00:00:38:54                         
Zach And Jenv4          00:00:40:37                         
Zach And Jenv4          00:00:42:20                         
Zach And Jenv4          00:00:44:03                         
Zach And Jenv4          00:00:45:44                         
Zach And Jenv4          00:00:47:28                         
Zach And Jenv4          00:00:49:11                         
Zach And Jenv4          00:00:50:54                         
  • Tested the script and it works fine. – Khalil Aug 12 '22 at 18:02
  • 1
    Try testing it with a smaller file. – Khalil Aug 12 '22 at 18:03
  • @Khalil That's really strange I wonder if it's something embedded into the .txt file itself because it works fine with other .txt files. – Billathekilla Aug 12 '22 at 18:06
  • 1
    can you please show us the code that is used to count the lines? have you been reading the lines to a list and then get the length of the list? the lines in the sample are terminated with Carriage Return and Line Feed (hex 0D 0A) – lroth Aug 12 '22 at 18:25
  • @Iroth That is the entire script. I didn't leave anything out. – Billathekilla Aug 12 '22 at 18:32
  • @Billathekilla the code works fine for me. what platform and operating system are you using? what python version are you using? – lroth Aug 12 '22 at 18:39
  • @Iroth I'm running Windows 11 with Python 3.9. I'm not really versed in unicode, but I'm wondering if there may be any hidden in actually .txt. Is something like this possible? – Billathekilla Aug 12 '22 at 18:42
  • I know this isn't the right way to do it, but for now, I'm just going to divide by 2 and floor because all the text files will be generated in the same way. If anyone has any suggestions please let me know though. Thanks! – Billathekilla Aug 12 '22 at 18:55
  • Just look at the first few entries in the `f.readlines()` list and post it if you can't figure it out. You;ll be able to see how the lines were parsed. – Mark Tolonen Aug 12 '22 at 19:08
  • os agnostic way of splitting lines is `f.read().splitlines()` – Edo Akse Aug 12 '22 at 19:38

1 Answers1

1

Here's a likely explanation, but OP should look at f.readlines() content to be sure.

The file has \r\r\n line termination and the default for open is to translate \r, \n, and \r\n each to a newline when reading, so \r\r\n gets translated to \n\n. One way to generate a file with these line terminations is to use Python's csv.writer without the documented newlines='' parameter when opening the file for writing on a Windows OS:

import csv

# Create "bad" file
with open('test.csv','w') as f:  # should have newline='' as a parameter as well
    r = csv.writer(f)
    r.writerow(['a','b','c'])
    r.writerow([1,2,3])
    r.writerow([4,5,6])

# Read file as OP did
with open('test.csv') as f:
    data = f.readlines()

print(len(data))
print(data)

Output:

6
['a,b,c\n', '\n', '1,2,3\n', '\n', '4,5,6\n', '\n']

With newline='' parameter added to the open:

3
['a,b,c\n', '1,2,3\n', '4,5,6\n']

Open the CSV file in Excel, Notepad or Notepad++ and you'll see the same double-newline issue, but dumping from the command line doesn't:

C:\>type test.csv
a,b,c
1,2,3
4,5,6

A hex editor will show the \r\r\n (0D 0D 0A in hexadecimal):

enter image description here

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251