3

Possible Duplicate:
ruby 1.9: invalid byte sequence in UTF-8

I'm currently building a file system crawler and receiving the following error when running my script:

wordcrawler.rb:8:in `block in <main>': invalid byte sequence in UTF-8 (ArgumentError)
    from /Users/Anconia/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:41:in `block in find'
    from /Users/Anconia/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:40:in `catch'
    from /Users/Anconia/.rvm/rubies/ruby-1.9.3-p327/lib/ruby/1.9.1/find.rb:40:in `find'
    from wordcrawler.rb:5:in `<main>'

And here is my code:

require 'find'

count = 0

Find.find('/Users/Anconia/') do |file|                   # '/' for root directory on OS X
  if file =~ /\b(\.txt|\.doc|\.docx)\b/                # check if filename ends in desired format
    contents = File.read(file)
      if contents =~ /regex/
      puts file
      count += 1
    end
  end
end

puts "#{count} files were found"

In my dev environment I use ruby 1.9.3; however, when I switch to ruby 1.8.7 the script runs properly. And I'd like to continue using 1.9.3 if possible. I've tried every solution in this post (ruby 1.9: invalid byte sequence in UTF-8) but my problems still persist. Any suggestions?

Community
  • 1
  • 1
Anconia
  • 3,888
  • 6
  • 36
  • 65

1 Answers1

6

Wasn't properly understanding the contents of the aforementioned post. At bare minimum this can be used as an implemented example of this post

require 'find'

count = 0

Find.find('/Users/Anconia/') do |file|                                              # '/' for root directory on OS X
  if file =~ /\b(\.txt|\.doc|\.docx)\b/                                           # check if filename ends in desired format
    contents = File.read(file).encode!('UTF-8', 'UTF-8', :invalid => :replace)    # resolves encoding errors - must use 1.9.3 else use iconv
      if contents =~ /regex/
        puts file
        count += 1
    end
  end
end

puts "#{count} files were found"
Community
  • 1
  • 1
Anconia
  • 3,888
  • 6
  • 36
  • 65
  • Thanks for your answer. I had an issue which has been solved by using :invalid => :replace :-) – Arkan Mar 28 '13 at 15:02