-1

I have a .csv file I wish to read with Python (3.x) using csv package. However, the program truncates the beginning of the file (first 44797 rows).

The .csv file in question can be downloaded from this link: https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data/downloads/new-york-city-airbnb-open-data.zip/3

file = "C:\\Users\\Owner\\Pictures\\Camera Roll\\new-york-city-airbnb- 
open-data\\AB_NYC_2019.csv"
rowsn = []
coln = []
with open(file, encoding="utf8") as csvfile:
    csvreader = csv.reader(csvfile)
    coln.append(0)
    for row in csvreader:
        rowsn.append(row)
        print("Appending" + str(row))
    for q in rowsn:
        for r in q:
            print(r, end="        ")
        print("\n")

I expected the entire file to be printed on the terminal row by row. However,the first 44797 rows do not appear on the screen. Please help. Thanks.

Antoine Claval
  • 4,923
  • 7
  • 40
  • 68
KingPat
  • 3
  • 1
  • 3
    Scroll up - the first line are out of sight on your console or not buffered because it only shows so many lines.... – Patrick Artner Oct 12 '19 at 14:32
  • @PatrickArtner I thought so too....but the first row after scrolling up is the 44,798th row. – KingPat Oct 12 '19 at 15:12
  • add `break` after `print("\n")` .. should only show the first line – Patrick Artner Oct 12 '19 at 15:16
  • @PatrickArtner I tried adding break.....Did show first line....Then I tried limiting the number of repetitions of the for loop. Program worked as expected for smaller repetitions but started truncated the beginning of the output for repetitions greater than 4098. If I limit the loop to 4099 repetitions, it truncates half of the first line. Why the 4098 threshold? – KingPat Oct 13 '19 at 16:23
  • Thanks @PatrickArtner. However, 2^14 is actually 4096 (I did think of that before, but couldn't really explain why the threshold was 2^14+2). Thanks a lot. However, is there any way I can overcome this problem? – KingPat Oct 14 '19 at 10:32

1 Answers1

0

The fact that you can break and see the 1st line means your console output scrolls too fast.

The shell that holds your output has a buffer of about 4098 lines. If you print 50k lines, the first (50.000-4098) lines scroll so fast you do not see them.The buffer only holds the last 4098 lines so you can only scroll back so far.

If you really want to scroll through 500k lines, give yourself time to read:

for linecount, q in enumerate(rowsn,1):
    for r in q:
        print(r, end="        ")
    print("\n")
    if not linecount % 4000:         # every 4000 lines, ask for return press
        intput("Hit return...")

Now you have to press return about 126 times or so... alternatively you can research how to enlarge the buffer of your console - for a default-windows console, you get further tips here: How to change Screen buffer size in Windows Command Prompt from batch script

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69