0

I'm trying to use gsub to parse out this: which is an apostrophe from Microsoft Word. Here's what I"m doing

row['Content'] = row['Content'].gsub(/’/, '-' )

gives me the error

    reader.rb:18: invalid multibyte char (US-ASCII)
    reader.rb:18: invalid multibyte char (US-ASCII)
    reader.rb:18: syntax error, unexpected $end, expecting ')'
    row['Content'] = row['Content'].gsub(/’/, '-' )

I've tried all sorts of variations and looked at this question, but am at loss. Thanks for any help you can give.

Community
  • 1
  • 1
Alekx
  • 891
  • 1
  • 11
  • 19

1 Answers1

0

You have a typo in row['Content'} - you should be closing with a square bracket rather than a brace (row['Content'])

Jon M
  • 11,669
  • 3
  • 41
  • 47
  • good catch. I don't know why, but I didn't copy/past it. I fixed it and it's now accurate to the code I'm using. – Alekx Mar 08 '12 at 04:01
  • 1
    could you edit the question to include the full error backtrace? I'm not getting the same problem when I try to run it – Jon M Mar 08 '12 at 04:03
  • What *is* that character you're trying to `gsub` out? I'm copying and pasting from your code example and having no problem, is it a special unicode character? – Jon M Mar 08 '12 at 04:12
  • 1
    And if it is, you may need to add an encoding specifier at the top of the file, such as `# encoding: UTF-8` – Jon M Mar 08 '12 at 04:13
  • Agreed, this is probably an encoding issue—especially if the OP is using Ruby 1.9. – Andrew Marshall Mar 08 '12 at 04:23
  • I added `# encoding: UTF-8` and the error changed to `reader.rb:20:in 'gsub': incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string) (Encoding::CompatibilityError)` this only happens when the `’` is in the CSV – Alekx Mar 08 '12 at 04:24
  • Encoding issues really make my brain hurt! If it's just that the regex can't deal with differing encodings, you could try a string gsub: `gsub("’", '-')` – Jon M Mar 08 '12 at 04:29
  • Yeah, mine too. I tried the string gsub too with no luck. Oh well. I'll keep working at it. – Alekx Mar 08 '12 at 04:35
  • My only other idea would be to `.encode('utf8')` the string coming from word before you try and do the gsub. – Jon M Mar 08 '12 at 04:56
  • If that doesn't help, perhaps you can indicate in the question somehow, exactly what that character is so others can help better – Jon M Mar 08 '12 at 04:57
  • I updated the question. It's just an apostrophe from microsoft word. – Alekx Mar 08 '12 at 05:03