-1

I am parsing rows in a CSV, and this is an example of how each row object looks:

 pry(Program)> row
=> #<CSV::Row "Broadcast Name":"2020 FC Cincinnati vs Toronto FC | MLS" "Description":"Major League Soccer is a men's professional soccer league sanctioned by the United States Soccer Federation which represents the sport's highest level in the United States. The league comprises 26 teams—23 in the U.S. and 3 in Canada and constitutes one of the major professional sports leagues in both countries." "Category":"Sports,Soccer" "Sizzle Reel":"https://youtu.be/fM5aHIVBTIc" "Thumbnail Image":"Screen Shot 2020-01-14 at 5.27.31 PM.png (https://dl.airtable.com/.attachments/7d9273fef3fc1cf1a44c4f2c4db395e7/05f3cd4a/ScreenShot2020-01-14at5.27.31PM.png)" "Header Image":"Screen Shot 2020-01-14 at 5.27.31 PM.png (https://dl.airtable.com/.attachments/7d9273fef3fc1cf1a44c4f2c4db395e7/05f3cd4a/ScreenShot2020-01-14at5.27.31PM.png)" "Source":"Flosports" "Date":"3/21/2020" "Marketing":"https://www.mlssoccer.com/" "Total Available Impressions":"10.0MM" "Programming Type":"Live - VOD" "Demo":"P18-49" "Recommended":"YES" "Featured":"YES" nil:"10000000">

However, whenever I try to access the first element by the key name it returns nil:

> row["Broadcast Name"]
=> nil

When I try to access it via the Index in the array it returns the right result:

 row[0]
=> "2020 FC Cincinnati vs Toronto FC | MLS"

When I access any other element by the key name it works:

> row["Description"]
=> "Major League Soccer is a men's professional soccer league sanctioned by the United States Soccer Federation which represents the sport's highest level in the United States. The league comprises 26 teams—23 in the U.S. and 3 in Canada and constitutes one of the major professional sports leagues in both countries."
[18] pry(Program)> row["Category"]
=> "Sports,Soccer"
[19] pry(Program)> row["Sizzle Reel"]
=> "https://youtu.be/fM5aHIVBTIc"

When I convert it to a hash it seems fine:

> row.to_h
=> {"Broadcast Name"=>"2020 FC Cincinnati vs Toronto FC | MLS",
 "Description"=>
  "Major League Soccer is a men's professional soccer league sanctioned by the United States Soccer Federation which represents the sport's highest level in the United States. The league comprises 26 teams—23 in the U.S. and 3 in Canada and constitutes one of the major professional sports leagues in both countries.",
 "Category"=>"Sports,Soccer",
 "Sizzle Reel"=>"https://youtu.be/fM5aHIVBTIc",
 "Thumbnail Image"=>"Screen Shot 2020-01-14 at 5.27.31 PM.png (https://dl.airtable.com/.attachments/7d9273fef3fc1cf1a44c4f2c4db395e7/05f3cd4a/ScreenShot2020-01-14at5.27.31PM.png)",
 "Header Image"=>"Screen Shot 2020-01-14 at 5.27.31 PM.png (https://dl.airtable.com/.attachments/7d9273fef3fc1cf1a44c4f2c4db395e7/05f3cd4a/ScreenShot2020-01-14at5.27.31PM.png)",
 "Source"=>"Flosports",
 "Date"=>"3/21/2020",
 "Marketing"=>"https://www.mlssoccer.com/",
 "Total Available Impressions"=>"10.0MM",
 "Programming Type"=>"Live - VOD",
 "Demo"=>"P18-49",
 "Recommended"=>"YES",
 "Featured"=>"YES",
 nil=>"10000000"}

But the same issue appears:

[22] pry(Program)> row.to_h["Broadcast Name"]
=> nil
[23] pry(Program)> row.to_h["Category"]
=> "Sports,Soccer"
[24] pry(Program)> row.to_h["Sizzle Reel"]
=> "https://youtu.be/fM5aHIVBTIc"

The crazy thing is that when I do a list of all the keys, it shows up all of the keys correctly:

[21] pry(Program)> row.to_h.keys
=> ["Broadcast Name",
 "Description",
 "Category",
 "Sizzle Reel",
 "Thumbnail Image",
 "Header Image",
 "Source",
 "Date",
 "Marketing",
 "Total Available Impressions",
 "Programming Type",
 "Demo",
 "Recommended",
 "Featured",
 nil]

So what could be causing row["Broadcast Name"] to fail so consistently regardless of what I do?

marcamillion
  • 32,933
  • 55
  • 189
  • 380
  • Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. See: "[How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example)" and the linked pages. – the Tin Man Feb 04 '20 at 08:14
  • I can't reproduce on the hash you posted. Theoretically, it could be a different string that looks similar, due to lookalikes, different normalisation state, or invisible characters. For example, `"Brοadcast Name" != "Broadcast Name"`, because the first one contains 'GREEK SMALL LETTER OMICRON' (U+03BF) where the letter `o` should be. Compare `str.codepoints` rather than relying on visual confirmation of rendered fonts. – Amadan Feb 04 '20 at 08:19

1 Answers1

5

It doesn't match because your key / header starts with an invisible character:

row.headers[0].codepoints
#=> [65279, 66, 114, 111, 97, 100, 99, 97, 115, 116, 32, 78, 97, 109, 101]
#    ^^^^^

That's U+FEFF, or "ZERO WIDTH NO-BREAK SPACE" which is used as a byte order mark.

To fix the problem, strip the BOM. See How to avoid tripping over UTF-8 BOM when reading files.

Stefan
  • 109,145
  • 14
  • 143
  • 218