1

I don't know why this is happening in my ruby, but do you see the same behaviour?

3.1.2 :001 > ["url", "label:from", "label:type", "label:batch", "note"].index('url')
 => nil
3.1.2 :002 > ["url", "label:from", "label:type", "label:batch", "note"].index('note')
 => 4
3.1.2 :003 > ["Url", "label:from", "label:type", "label:batch", "note"].index('Url')
 => 0

It can't find 'url' when downcased. Is this a reserved word?

Edit: it seems not to be able to find the first occurrence of "url" string:

["note", "url", "label:from", "label:type", "label:batch", "note", "url"].index 'url'
 => 6       
Greg
  • 5,862
  • 1
  • 25
  • 52
den
  • 51
  • 2
  • 8
  • I do see this behavior on 3.1.1 and it seems it's related to "url" string itself. If you put other strings as first, when you have more than one occurrences of "url" it find the second one. Most bizarre. – Greg Apr 15 '22 at 08:31

1 Answers1

5

The first entry in your array is not what you think it is. Look at the raw bytes and you'll see:

["url", "label:from", "label:type", "label:batch", "note"].first.bytes.map { |x| x.to_s(16) }
# ["ef", "bb", "bf", "75", "72", "6c"]

The 0x75 0x72 0x6c is the "url" you see, the 0xef 0xbb 0xbf is a Byte Order Mark (BOM). Byte order is meaningless in UTF-8 so BOMs should not be used, they're valid but unusual and not recommended. You can have Ruby strip the BOMs while reading files if that's where the string is coming from.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
  • 1
    To add to this, previously I could copy the first example and reproduce, but when I edited the original (https://stackoverflow.com/posts/71881508/revisions) the edit seems to have changed the encoding and it's not longer reproducible for me when I copy the first snippet. – Greg Apr 15 '22 at 08:51
  • 1
    @Grzegorz The editor seems to have stripped the BOM from my answer as well, sigh. – mu is too short Apr 15 '22 at 09:06