6

During the process of reading a CSV file into an Array I noticed the very first array element, which is a string, contains a leading "" .

For example:

str = contacts[0][0]
p str

gives me...

"SalesRepName"

Then by sheer chance I happened to try:

str = contacts[0][0].split(//)
p str

and that gave me...

["", "S", "a", "l", "e", "s", "R", "e", "p", "N", "a", "m", "e"]

I've checked every other element in the array and this is the only one that has a string containing leading "".

holaymolay
  • 520
  • 4
  • 17
  • 3
    I honestly don't agree with this being closed as a duplicate. The issue in the referenced article is not at all the same as this one. If I would have come across it during my research I would have disregarded it because it doesn't explain the problem i was having. By down-voting this question you're disincentivising me from posting valuable information that could potentially help other people who encounter this same problem. The way I described the issue/answer it focuses on the symptom. The least you could do is post a competing answer that explains what's going on. – holaymolay Nov 08 '15 at 19:49
  • 1
    The topic of ZERO WIDTH SPACE is one where there are not many answers to - http://www.verkltas.club/questions/tagged/zero-width-space?sort=votes&pageSize=15 I am not a fan of the Zero Width Space, because of what I deem as the non-uniform handling by email clients, web browsers and word processors ... This topic should not be closed. – Xofo Feb 25 '16 at 21:39

1 Answers1

14

Now, before I could post this question I stumbled upon the answer. Apparently, the act of me writing up the question gave me the idea of determining the ascii number of this "" character.

str = contacts[0][0].split(//)
p str[0].codepoints

gave me

[65279]

upon inquiring about ascii character 65279 I found this article: https://stackoverflow.com/a/6784805/3170942

According to SLaks:

It's a zero-width no-break space. It's more commonly used as a byte-order mark (BOM).

This, in turn, led me to the solution here: https://stackoverflow.com/a/7780559/3170942
In this response, knut provided an elegant solution, which looked like this:

File.open('file.txt', "r:bom|utf-8"){|file|
  text_without_bom = file.read
}

With , "r:bom|utf-8" being the key element I was looking for. So I adapated it to my code, which became this:

CSV.foreach($csv_path + $csv_file, "r:bom|utf-8") do |row|
  contacts << row
end

I spent hours on this stupid problem. Hopefully, this will save you some time!

Community
  • 1
  • 1
holaymolay
  • 520
  • 4
  • 17
  • 1
    According to this page, I am using the CSV library to parse the file: http://ruby-doc.org/stdlib-2.2.3/libdoc/csv/rdoc/CSV.html
    I'm not understanding your issue with my original question and subsequent answer
    – holaymolay Nov 09 '15 at 01:34
  • Thank you. I don't know if I would have ever found that zero-width space - converted at some point in my process to a normal space. And where did it come from? – Anita Graham Jun 25 '19 at 07:01
  • @AnitaGraham I do not know where it came from. I would like to know myself. – holaymolay May 28 '20 at 21:26