I have a list of the Unicode emojis and I want to strip the emojis from them (i.e just want the whole first part and the name at the end of the row). A sample rows are like these ones:
1F468 1F3FD 200D 2695 FE0F ; fully-qualified # ⚕️ man health worker: medium skin tone
1F469 1F3FF 200D 2695 ; non-fully-qualified # ⚕ woman health worker: dark skin tone
(from where I have deleted some spaces for the sake of simplicity). What I want is to match is the [non-]fully-qualified
part as well as the #
and the emoji, so I can delete them with sed
. I have tried the following regex
sed -e 's/\<[on-]*fully-qualified\># *.+?(?=[a-zA-Z]) //g'
which tries to match the words [non-]fully-qualified
a space, the #
symbol, and then whatever you can find (non-greedy) until the first letter, and replace it with an empty string.
I would like to have this output:
1F468 1F3FD 200D 2695 FE0F ; man health worker: medium skin tone
1F469 1F3FF 200D 2695 ; woman health worker: dark skin tone
I have tried several posted answers to no avail, and besides, I'm trying to match a pattern between two boundaries which is were I'm having the trouble
EDIT: I'm trying to run the command in the git bash shipped with git for windows