3

I'm really confused.

My python script, over and over, is only writing ~70,000 rows of what should be ~325,000 rows.

The script executes fine - I've tested it on multiple files, and it only fails to render the entire file when the source is this large (325,000), as opposed to smaller files with 5,000 rows or so. I'm wondering if I'm doing something wrong.

import csv,time,string,os, requests
dw = "\\\\network\\folder\\btc.csv"

inv_fields = ["id", "rsl", "clr_five"]

with open(dw) as infile, open("c:\\upload\\log.csv", "wb") as outfile:
    r = csv.DictReader(infile)
    w = csv.DictWriter(outfile, inv_fields, extrasaction="ignore")

    #write our custom header to match solr, also include new "id" column
    wtr = csv.writer( outfile )    
    wtr.writerow(["id", "resale", "favorite_color"])
    for i, row in enumerate(r, start=1):
        row['id'] = i
        w.writerow(row)

The script loads the first file, which has about 42 columns in it, and 325,000 rows. It finds the two columns named "rsl" and "clr_five", then writes those, along with a new "id" column, to a new file.

Is there something native to this code that just... stops it after it reaches a certain number?

Brian Powell
  • 3,336
  • 4
  • 34
  • 60
  • 1
    If this is the *actual* code, and you aren't skipping some lines on exceptions, it's probably time to pull out some more wizardry. [This](http://jvns.ca/blog/2014/04/20/debug-your-programs-like-theyre-closed-source/) is a really good start – Wayne Werner Jul 05 '16 at 19:20
  • What value prints if you put `print i` after your for-loop? – Steven Rumbalski Jul 05 '16 at 19:27
  • @StevenRumbalski it gets to `70179` then it stops - no error, just keeps chugging along with the rest of the script. – Brian Powell Jul 05 '16 at 19:31
  • That would indicate the problem is with the reading of the file, not the writing. – Steven Rumbalski Jul 05 '16 at 19:33
  • @BrianPowell - This might be silly, but do you have sufficient disk space where you are storing the output CSV? If there is a problem with the read limit, as @StevenRumbalski said, you might need to set this `csv.field_size_limit()` to a larger value. – HEADLESS_0NE Jul 05 '16 at 19:33
  • @WayneWerner it's the actual code - minus the true names of the file paths.... There's plenty of code before and after this - but it has no effect on this section. – Brian Powell Jul 05 '16 at 19:33
  • Is there a quoting error in the source data around line 70,179? In other words, might an unclosed quote cause the remainder of the file to be read as a single row? – Steven Rumbalski Jul 05 '16 at 19:34
  • @StevenRumbalski Once the 400mb csv file opens in Excel so I can look at it, I'll let you know :) – Brian Powell Jul 05 '16 at 19:36
  • 1
    You don't need to wait for excel if your row lengths are supposed to be the same. Just spin through it with Python and print out the lines with the wrong length ;) – Wayne Werner Jul 05 '16 at 19:38
  • 2
    `for i, line in enumerate(inputfile): if 70170 < i < 70190: print line` This will give you some lines before and after the row your looking for. Don't look in Excel as Excel has its own parsing rules that may differ from the `csv` module defaults. – Steven Rumbalski Jul 05 '16 at 19:38
  • It looks like that row has one text field with a → character in it. When I look at 5 rows back and 5 rows forward, this is the only textual anomaly - or really anything that looks different from the other rows around it. I'm guessing this is what's causing it to screw up. I'll remove it, save the file, and retest, but I'm not sure about how to code something in python that would do this automatically - (e.g. - UTF-8 everything ahead of time) – Brian Powell Jul 05 '16 at 19:40
  • See [this answer](http://stackoverflow.com/questions/904041/reading-a-utf8-csv-file-with-python/14786752#14786752) to [Reading a UTF8 CSV file with Python](http://stackoverflow.com/questions/904041/reading-a-utf8-csv-file-with-python). If this solves your problem, let us know and we can close your question as a duplicate. (Don't delete the question. It has value as a gateway to the duplicate question.) – Steven Rumbalski Jul 05 '16 at 19:44
  • @StevenRumbalski Yeah - I think ultimately this was simply a question about python running into a character that was not encoded in a way it could read. I'm not sure this particular answer solves my problem, but I'll read more into it and test out the solution for UTF-8 encoding my file during read. Thanks so much for all your help! – Brian Powell Jul 05 '16 at 19:53

0 Answers0