I have a string definition
, in which HTML can appear, and an array of words. I am trying to search for these words in the definition
and return the start and the end positions. For example, I might want to find "Hello"
in:
definition = "<strong>Hel</strong>lo World!"
Getting rid of the HTML can be done using sanitize from ActionView
and HTMLEntities
, but that changes the index of "Hello"
in the string, so:
sanitized_definition.index("Hello")
will return 0
. I need the start point to be 8
, and the end point 21
. I thought about mapping the entire string to my own indices like
{"1" => '<', "2" => 's', "3" => 't', .. , "9" => 'H' ...}
so that 1 maps to the first character, 2 to the second, and so on, but I'm not sure what that accomplishes, and it seems overly complicated. Does anyone have any ideas how to accomplish this?
EDIT:
Good point in the comments that it doesn't make sense that I want to include the </strong>
, but not the <strong>
at the beginning, partially because I haven't figured out what to do with that edge case. For the purposes of this question, a better example might be something like
definition = "Probati<strong>onary Peri</strong>od."
search_text = 'Probationary Period'
Also, after thinking about it a little bit more, I think in my particular case, the only html entity that I need to worry about is
.