1

The string in question is something like: Tomask Kassahun

How can I strip out the last emoticon/emoji (whatever it's called), so I just get Tomask Kassahun? Of course, it could also be any other emoticon like a rocket ship.

Henley
  • 21,258
  • 32
  • 119
  • 207
  • 1
    I guess you need to define the e-word in terms of a collection of Unicode characters. See [this ref](https://unicode.org/emoji/charts/full-emoji-list.html). – Cary Swoveland Nov 24 '20 at 21:45

2 Answers2

2

Updated for Ruby 3.2.0: The Unicode Emoji Character Property

As of Ruby 3.2.0, Ruby now supports a documented \p{Emoji} character property specifically for Unicode emojis. This support was introduced in Onigmo 6.2.0 but was undocumented in Ruby core as recently as Ruby 3.1.2. However, this contains behavior that, while spec-conforming, will unexpectedly remove non-emoji characters, such as numbers, from a string. Thus, it is preferable to use the unfortunately-undocumented (as of this time) character property \p{Emoji_Presentation} (shorthand \p{EPres}). If your Ruby version and/or engine supports it, you can remove just emojis using the following examples.

Example 1: Remove Emojis the Trailing Whitespace Left Behind

"Tomask Kassahun ".gsub(/\p{Emoji_Presentation}/, '').strip
#=> "Tomask Kassahun"

Example 2: Demenstrate Removal Without Affecting Other Unicode Sets

"Tomask (mɑ̃ʒe) Kassahun ".gsub(/\p{Emoji_Presentation}/, '').strip
#=> "Tomask (mɑ̃ʒe) Kassahun"

If you are on an older Ruby or one that doesn't support the emoji character property, there are other properties that can also work well. I've described them below.

Remove Emojis Based on Other Character Properties

One possible approach is to strip out Unicode characters like using "Symbol: Other" from Ruby's character properties. For example:

"Tomask Kassahun ".gsub(/\p{So}/, '').strip
#=> "Tomask Kassahun"

This even works with strings containing accented characters. For example, borrowing some non-emoji accented characters from another post as a test case:

"Tomask (mɑ̃ʒe) Kassahun ".gsub(/\p{So}/, '').strip
#=> "Tomask (mɑ̃ʒe) Kassahun"
swrobel
  • 4,053
  • 2
  • 33
  • 42
Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • 1
    @swrobel I see that, and updated my answer. Thanks. – Todd A. Jacobs Jul 08 '23 at 17:55
  • 1
    @cremno The character property was undocumented in Ruby core until Ruby 3.2.0, but I've updated my answer and even included the Onigmo patch that may have added support for it in earlier rubies. Thank you. – Todd A. Jacobs Jul 08 '23 at 17:56
  • Actually, there is [very strange behavior](https://github.com/k-takata/Onigmo/issues/147#issuecomment-860176959) with the Emoji character property! You should definitely use `Emoji_Presentation` for the desired behavior! – swrobel Jul 16 '23 at 00:47
0

I think that it's a good case to use a regular expression, I'm not a regex expert but I think the following expression could be a good starting point.

str = "Tomask Kassahun "

Extract a substring passing an Element Reference, if a Regexp is supplied, the matching portion of the string is returned.

str[/^[a-zA-Z]+\s{1}[a-zA-Z]+/] #=> Tomask Kassahun

String match method returns an array

str.match(/^[a-zA-Z]+\s{1}[a-zA-Z]+/) #=> ['Tomask Kassahun']

You can pass the index

str.match(/^[a-zA-Z]+\s{1}[a-zA-Z]+/)[0] #=> Tomask Kassahun

Check https://ruby-doc.org/core-2.7.2/String.html#method-i-5B-5D

hernanvicente
  • 914
  • 8
  • 18