SmarterCSV end of file reached, but ruby's CSV library can process the file

Question

I have two CSV files which are basically the same, but for some reason, SmarterCSV can not read the one named bad_file Here is a Gist of both files. Ruby's native CSV library can read bad_file no problem.

Before processing each file. I strip everything above the header row using the below code:

  def self.clean(file)
    if (csv = File.read(file).gsub!(/\A.+?(?=^Date,)/m, ''))
      tempfile = Tempfile.new('file_name')
      tempfile.write(csv)
      tempfile
    else
      file
    end
  end

I then pass that file into smarter CSV like this:

    File.open(file, encoding: 'bom|utf-8') do |f|
      chunk = SmarterCSV.process(f, {
                                   verbose: true,
                                   remove_empty_hashes: true,
                                   col_sep: :auto,
                                   force_utf8: true,
                                   force_simple_split: true,
                                   strip_chars_from_headers: /[\-"\xEF\xBB\xBF]/,
                                   duplicate_header_suffix: ''
                                 })
    end

I can not figure out what is even differnt about the CSV files, let alone why SmarterCSV can't process the bad one. Also, if anyone has a better method for stripping the unneeded info from the top of the spreadsheet, that could solve this problem right there.

Can you add the code you are using for the native CSV? Also is your error `gems/smarter_csv-1.7.3/lib/smarter_csv.rb:243:in 'readline'`? I am getting that error for both files. — Beartech, Jan 05 '23 at 04:03
I was having trouble getting your `clean()` method to work as written so I just ran `File.read('bad_file.csv').gsub!(/\A.+?(?=^Date,)/m, '')` and then copied the result into a new text file. I had to manually replace the `\n` with a and the quotes came out like `\"\"\"` so I had to replace them with a single `"` but after that both files process no problem with Smarter_CSV. — Beartech, Jan 05 '23 at 04:36
Have you actually opened and looked at the temp files you are creating? Mine were always empty. — Beartech, Jan 05 '23 at 04:37
To get it to parse with the native CSV library I just passed the tempfile into CSV.parse and it would return an array of arrays of the data in the file, exactly what I expected it to do. — Arel, Jan 05 '23 at 04:42
I just ran `b_file = File.read('badfile.csv').gsub!(/\A.+?(?=^Date,)/m, '')` then I did `File.write('b_file.csv', b_file)` and then ran your exact SmartCSV code with that file name and it worked perfect. Can you try that? — Beartech, Jan 05 '23 at 04:45
@Beartech yep, I get a string of the file content like this: `"Date,AircraftID,From,To,Route\r\n2022-11-29,N17AV,21N,21N,KHWV,,,,,,,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0,0.00,0,0,0,0,0,0.0,0.0,0.00,0.00,0.00,0.00,0,,,,,,,2.0,0.0,0.0,0.0,,,,,,,,,false,false,false,false,false,\"\"\"Practiced eights on pylons, chandelles, and steep spirals. \"\"\",0.000000\r\n ... " — Arel, Jan 05 '23 at 04:48
And if you write that to a file using the one liner I gave and run it through your code? — Beartech, Jan 05 '23 at 04:49
That's giving me Errno::EBADF Bad file descriptor and the file isn't getting returned. I get something like `5331` as the return value of that method. What should I be returning? — Arel, Jan 05 '23 at 04:58
And the original error I get is `EOFError end of file reached` — Arel, Jan 05 '23 at 04:59
I think 5331 is the bytes written? I notice your string output has `\r\n` what OS are you on? I'm on a Mac. I have run into issues before with file encodings and vs . — Beartech, Jan 05 '23 at 05:06
`p Encoding.find("filesystem") ` will confirm that you are working in UTF-8. — Beartech, Jan 05 '23 at 05:11
Strange. Like I said, I can run the cleanup code and save it to a file in the pwd, and it works for both files. But when I use your clean method to save to a temp file it doesn't work. — Beartech, Jan 05 '23 at 16:51
@Beartech. Interesting. I just looked it up, and that method returns the length written, which is that number. — Arel, Jan 05 '23 at 16:53
So I just tried running the bad file through your clean method but line by line instead of the method. I ran: `csv = File.read(file).gsub!(/\A.+?(?=^Date,)/m, '')) tempfile = Tempfile.new('file_name') tempfile.write(csv)` and then used the SmarterCSV code and it worked perfect. It's something in your clean method. — Beartech, Jan 05 '23 at 17:03
And your clean method is a Class method. What class is it a method of? That could be your problem? — Beartech, Jan 05 '23 at 17:05
@Beartech I just tried exactly the same thing and ran into the same problem, but I threw a byebug in there, tried running SmarterCSV.process, which failed as expected, but when I typed continue it processed the file fine, so I really have no idea. — Arel, Jan 05 '23 at 17:47
To get around this issue, I just stopped editing the file, and now I find the row number that the headers are on and skip the previous rows. — Arel, Jan 05 '23 at 20:18
can you submit a sample CSV file in a GitHub issue for https://github.com/tilo/smarter_csv ? — Tilo, Apr 15 '23 at 14:18

SmarterCSV end of file reached, but ruby's CSV library can process the file

0 Answers0