-1

Based on this question: Bash sed - find hashtags in string; with no solutions for this case (when you have special characters).

This question is well-researched and not a duplicate of this unrelated question as the referred doesn't covers all the asked topics (support to special characters and numbers; grep both between and after/before).

echo "Text and #hashtag" | grep -o '#[[:alpha:]]\+*' | tr -d '"' works successfully, returning #hashtag; that's still related to the mentioned question...

...About this new question with mine own needs (that can be useful to you), this is my version, parsing text between doublequotes instead of after hashtag:

echo '#first = "Yes"' | grep -o '"[[:alpha:]]\+*"' | tr -d '"' and it works, returning Yes.

However, when it have an emoji or other characters such as > and / (example: echo '#first = "✅ Yes"' | grep -o '"[[:alpha:]]\+*"' | tr -d '"') it returns an empty output.

It have to support any kind of character (emojis, html tags, numbers).

This should be useful not only for parsing between characters, but also after a character (such as parsing any #hashtag text) or before.

  • Referencing another question is fine, but your own question should still stand on its own without requiring context from the referenced question, and it currently doesn't. – Benjamin W. Dec 03 '21 at 03:01
  • 3
    I'm not sure why you expect `"✅ Yes"` to be matched by `"[[:alpha:]]\+*"`. ✅ is not an alphanumeric character. Any character is `*` in glob. If you use regex (with the `-E` flag for grep), you can also use a [negated character class](https://stackoverflow.com/questions/1763071/negate-characters-in-regular-expression) to filter out spaces or empty `" ... "` blocks – Aserre Dec 03 '21 at 06:54
  • 2
    Note that the extra quantifier (`*`) makes this pattern match the empty string as well, i.e. `""` – Thor Dec 03 '21 at 09:12
  • @BenjaminW., are you (constructively) criticizing the introduction of this question or, like [choroba](https://stackoverflow.com/questions/34557020/bash-sed-find-hashtags-in-string/#comment124129120_34557106), the OP? – dani 'SO learn value newbies' Dec 03 '21 at 19:47
  • 2
    I'm talking about the question. To understand it, one has to first read another question; it would be better if it were self-contained. – Benjamin W. Dec 03 '21 at 21:07
  • @Aserre, it works for numbers like `1638641201` and `0.9` (numbers with more than 1 characters), as here: `echo '"1638641201"' | grep -o '"...\+"' | tr -d '"'`. But it don't when there are one character numbers like `1`, `2`, etc: `echo '"1"' | grep -o '"...\+"' | tr -d '"'` – dani 'SO learn value newbies' Dec 04 '21 at 18:12
  • The solution is replacing `...` to `.` – dani 'SO learn value newbies' Dec 04 '21 at 18:16
  • 1
    Yeah, the 3 dots were meant for `whatever pattern you are looking for`. The pattern you are looking for is the one in tripleee's answer – Aserre Dec 05 '21 at 19:02

2 Answers2

2

The way to extract text between double quotes is to match any character except double quote, as many as possible, between double quotes.

grep -o '"[^"]*"' | tr -d '"'

Some test cases:

grep -o '"[^"]*"' <<\___here | tr -d '"'
there is "text" between "double quotes"
just one "?" here, "test me!"
any unpaired double quote " will not match 
___here

The second one of these will fail with the current code in your own answer.

tripleee
  • 175,061
  • 34
  • 275
  • 318
-2

Thanks to @Aserre's pointings, I could come up with an answer.

In order for the "get every text when it appear AFTER a charater" and "get every text when it appear BETWEEN quotes" (grep) to work with any character, we have to replace [[:alpha:]] in the block to ...

So, it is:

echo '#first = "✅ Yes"' | grep -o '"...\+"' | tr -d '"' (get anything which is between double quotes)

and:

echo "Text and #hashtag" | grep -o '#...\+' | tr -d '"' (get anything which is after a hashtag)

Update:

If you want to support things with only 1 character (such as numbers ranging from 0 to 9), replace ... to . (single dot)

It works, as in the question, for: emojis, letters, numbers and other special characters.

  • 1
    `grep -o '"...\+"'` does not "get anything between double quotes". It gets three or more characters between double quotes, but if one of them is a double quote character, they will straddle more than just a quoted string. – tripleee Dec 04 '21 at 18:21