0

I am trying to write a gsub expression that will replace a hyphen (-) with an endash (–) where the hyphen is preceded by a number. Basically because I want to show date periods as 1978 – 1980 rather than 1978-1980 as they appear in my data source.

Hyphens and endashes look pretty darn similar to me so I want to be specific and use the unicode character for the endash which is U+2013 while the hyphen is U+002D.

As a test I would like to convert:

"america-the-beautiful. 1760-about 1780" to "america-the-beautiful. 1760 – about 1780"

with test_string = "america-the-beautiful. 1760-about 1780"

I've confirmed that the regex is correctly identifying only the hyphens preceded by a number and that gsub replaces with a placeholder for the endash.

test_string.gsub(/(\d)-/, '\1 endash_placeholder ')

=> "america-the-beautiful. 1760 endash_placeholder ca. 1780"

I am struggling to remove both the hyphen and the endash_placeholder and use the actual unicode character.

I've used a number of SO questions to get further with this Ruby Output Unicode Character.

In irb I can return the unicode character for endash with puts "\u{2013}"

I've tried amending my gsub expression to test_string.gsub(/(\d)-/, '\1 \u{2013} ')

=> "america-the-beautiful. 1760 \\u{2013} ca. 1780"

I've also tried double quoting the unicode:

test_string.gsub(/(\d)-/, "\1 \u{2013} ")

=> "america-the-beautiful. 176\u0001 – ca. 1780"

What am I missing in order to use the specific unicode character code in the gsub expression?

whatapalaver
  • 865
  • 13
  • 25
  • 1
    Why not just `test_string.gsub(/(\d)-/, '\1–')`? It is quite specific IMHO. Of course you may use `test_string.gsub(/(\d)-/, "\\1\u2013")` – Wiktor Stribiżew Jul 05 '19 at 13:11
  • 1
    Note `"\1"` is a char with octal code 1, you need to use `"\\1"` to actually write a backslash and `1`, which forms a backreference in a regex. – Wiktor Stribiżew Jul 05 '19 at 13:20
  • My problem relates to the syntax for using unicode character codes in the gsub expression rather than regex grouping so I don't think the Q&A you referred to is relevant. Your second example in your first comment is just what I need though - thanks. The gsub expression in it's entirety is now `test_string.gsub(/(\d)\u{002D}/, "\\1 \u{2013} ")` – whatapalaver Jul 05 '19 at 13:23
  • Actually your second comment may throw light on why you are saying this is a duplicate. Maybe you are saying my problem was not one of using the unicode character but of how I was doing the backreference. I don't totally understand that but will go and have a think - thanks for solving my problem anyway. – whatapalaver Jul 05 '19 at 13:28
  • The dupe reason provides you the right form of writing backreferences in a double quoted string literal. See [the answer](https://stackoverflow.com/a/12065752/3832970). – Wiktor Stribiżew Jul 05 '19 at 13:35

0 Answers0