1

I have two CSV files which are basically the same, but for some reason, SmarterCSV can not read the one named bad_file Here is a Gist of both files. Ruby's native CSV library can read bad_file no problem.

Before processing each file. I strip everything above the header row using the below code:

  def self.clean(file)
    if (csv = File.read(file).gsub!(/\A.+?(?=^Date,)/m, ''))
      tempfile = Tempfile.new('file_name')
      tempfile.write(csv)
      tempfile
    else
      file
    end
  end

I then pass that file into smarter CSV like this:

    File.open(file, encoding: 'bom|utf-8') do |f|
      chunk = SmarterCSV.process(f, {
                                   verbose: true,
                                   remove_empty_hashes: true,
                                   col_sep: :auto,
                                   force_utf8: true,
                                   force_simple_split: true,
                                   strip_chars_from_headers: /[\-"\xEF\xBB\xBF]/,
                                   duplicate_header_suffix: ''
                                 })
    end

I can not figure out what is even differnt about the CSV files, let alone why SmarterCSV can't process the bad one. Also, if anyone has a better method for stripping the unneeded info from the top of the spreadsheet, that could solve this problem right there.

Beartech
  • 6,173
  • 1
  • 18
  • 41
Arel
  • 3,888
  • 6
  • 37
  • 91
  • Can you add the code you are using for the native CSV? Also is your error `gems/smarter_csv-1.7.3/lib/smarter_csv.rb:243:in 'readline'`? I am getting that error for both files. – Beartech Jan 05 '23 at 04:03
  • I was having trouble getting your `clean()` method to work as written so I just ran `File.read('bad_file.csv').gsub!(/\A.+?(?=^Date,)/m, '')` and then copied the result into a new text file. I had to manually replace the `\n` with a and the quotes came out like `\"\"\"` so I had to replace them with a single `"` but after that both files process no problem with Smarter_CSV. – Beartech Jan 05 '23 at 04:36
  • Have you actually opened and looked at the temp files you are creating? Mine were always empty. – Beartech Jan 05 '23 at 04:37
  • To get it to parse with the native CSV library I just passed the tempfile into CSV.parse and it would return an array of arrays of the data in the file, exactly what I expected it to do. – Arel Jan 05 '23 at 04:42
  • I just ran `b_file = File.read('badfile.csv').gsub!(/\A.+?(?=^Date,)/m, '')` then I did `File.write('b_file.csv', b_file)` and then ran your exact SmartCSV code with that file name and it worked perfect. Can you try that? – Beartech Jan 05 '23 at 04:45
  • @Beartech yep, I get a string of the file content like this: `"Date,AircraftID,From,To,Route\r\n2022-11-29,N17AV,21N,21N,KHWV,,,,,,,2.0,2.0,0.0,0.0,0.0,0.0,0.0,0,0.00,0,0,0,0,0,0.0,0.0,0.00,0.00,0.00,0.00,0,,,,,,,2.0,0.0,0.0,0.0,,,,,,,,,false,false,false,false,false,\"\"\"Practiced eights on pylons, chandelles, and steep spirals. \"\"\",0.000000\r\n ... " – Arel Jan 05 '23 at 04:48
  • And if you write that to a file using the one liner I gave and run it through your code? – Beartech Jan 05 '23 at 04:49
  • That's giving me Errno::EBADF Bad file descriptor and the file isn't getting returned. I get something like `5331` as the return value of that method. What should I be returning? – Arel Jan 05 '23 at 04:58
  • And the original error I get is `EOFError end of file reached` – Arel Jan 05 '23 at 04:59
  • I think 5331 is the bytes written? I notice your string output has `\r\n` what OS are you on? I'm on a Mac. I have run into issues before with file encodings and vs . – Beartech Jan 05 '23 at 05:06
  • `p Encoding.find("filesystem") ` will confirm that you are working in UTF-8. – Beartech Jan 05 '23 at 05:11
  • @Beartech I'm on a mac. and – Arel Jan 05 '23 at 16:40
  • Strange. Like I said, I can run the cleanup code and save it to a file in the pwd, and it works for both files. But when I use your clean method to save to a temp file it doesn't work. – Beartech Jan 05 '23 at 16:51
  • @Beartech. Interesting. I just looked it up, and that method returns the length written, which is that number. – Arel Jan 05 '23 at 16:53
  • So I just tried running the bad file through your clean method but line by line instead of the method. I ran: `csv = File.read(file).gsub!(/\A.+?(?=^Date,)/m, '')) tempfile = Tempfile.new('file_name') tempfile.write(csv)` and then used the SmarterCSV code and it worked perfect. It's something in your clean method. – Beartech Jan 05 '23 at 17:03
  • And your clean method is a Class method. What class is it a method of? That could be your problem? – Beartech Jan 05 '23 at 17:05
  • @Beartech I just tried exactly the same thing and ran into the same problem, but I threw a byebug in there, tried running SmarterCSV.process, which failed as expected, but when I typed continue it processed the file fine, so I really have no idea. – Arel Jan 05 '23 at 17:47
  • To get around this issue, I just stopped editing the file, and now I find the row number that the headers are on and skip the previous rows. – Arel Jan 05 '23 at 20:18
  • can you submit a sample CSV file in a GitHub issue for https://github.com/tilo/smarter_csv ? – Tilo Apr 15 '23 at 14:18

0 Answers0