1

I'm writing a csv file in Hindi(language), when I parse it I get different results.

For example, I make below csv file.

1234444070;आज आप कैसे हैं???

When I read the same file using open(csv_aws_url).read. I get:

"1234444070;\xE0\xA4\x86\xE0\xA4\x9C \xE0\xA4\x86\xE0\xA4\xAA \xE0\xA4\x95\xE0\xA5\x88\xE0\xA4\xB8\xE0\xA5\x87 \xE0\xA4\xB9\xE0\xA5\x88\xE0\xA4\x82???\r\n"

Can this happen that I read the same contents while parsing?

bill_cosby
  • 145
  • 2
  • 9
  • Try putting `puts` before your `open`. It looks like you're just seeing the `String#inspect` output, which is just a different visual representation of the same data. – Jordan Running Jul 08 '16 at 13:26

2 Answers2

2

Try open(csv_aws_url, encoding: "utf-8").read

The file is most likely being saved with a different encoding.

Ruby read CSV file as UTF-8 and/or convert ASCII-8Bit encoding to UTF-8 should be helpful.

Community
  • 1
  • 1
Prakash Murthy
  • 12,923
  • 3
  • 46
  • 74
1
open(csv_aws_url).read.force_encoding('utf-8')
bill_cosby
  • 145
  • 2
  • 9