1

I have words to match using only a single pattern. The criteria are one of the following:

  • it contains a number or an underscore at the first letter, OR

  • at least one special character (excluding underscore) within the word:

Should match

3testData
3test_Data
_testData
_test3Data
%data%
test%BIN%data
te$t&$#@daTa

Should NOT match

test_Data3

So far, I have managed to match some of them through:

[\p{^Alpha}]\S+

Except for the words where special characters are inside the word

3testData
3test_Data
_testData
_test3Data
%data%
test%BIN%data
test%BIN%data
te$t&$#@daTa

Manu
  • 45
  • 1
  • 7
  • 3
    What is the rule here? You may as well use `.+` or `\S+` – Wiktor Stribiżew Sep 23 '18 at 07:54
  • 2
    Hi Manu, welcome to stack overflow. Can you please elaborate what do you want as output? – vrintle Sep 23 '18 at 07:56
  • thanks! my goal is to get a whole-word match for each word. These words are from a code so I'm trying to get to a pattern specific to matching words with numbers and underscore at the first letter, and special characters at any part of the word. – Manu Sep 23 '18 at 08:03
  • to clarify, the bold font on the output is the match I got from using the regex pattern I used. – Manu Sep 23 '18 at 08:09
  • 3
    If you want to match them all where the first character can be a character, numeric, an underscore or a percentage sign, try [`^[\w%]\S+$`](https://regex101.com/r/Ry8ERx/1) – The fourth bird Sep 23 '18 at 08:12
  • You may try this: `[\w%\_]+`. But I'm not very sure what special characters you are looking for. https://regex101.com/r/do6T5N/1 – enxaneta Sep 23 '18 at 08:18
  • yeah sorry I forgot to indicate that. that's what I am trying to say on my first sentence, - number and underscore on first letter only - any special character at any position – Manu Sep 23 '18 at 08:50
  • @Manu But then `%data%` `test%BIN%data` `te$t&$#@daTa` should not match because those start with `t` or a `%` right? If the words are part of a larger text and lookbehinds are supported, try [`(?<=\s|^)[\d_]\S+(?=\s|$)`](https://regex101.com/r/7bmbEV/1) – The fourth bird Sep 23 '18 at 08:52
  • ah you're right, it should be number/underscore on first letter OR any other special character at any position. – Manu Sep 23 '18 at 08:57
  • @Manu Do [`(?<=\s|^)[\d_]\S+(?=\s|$)`](https://regex101.com/r/7bmbEV/1) or [`^[\d_]\S+$`](https://regex101.com/r/dIzQhT/1) work for you? – The fourth bird Sep 23 '18 at 09:05
  • @Jenny not really. If a number or underscore appear as the first character, it's a match but they can still be anywhere in the word – Manu Sep 23 '18 at 09:05
  • What is the difference to what I was saying? Maybe we should discuss this in a chat to clarify what exactly you mean. It looks like nobody understands it 100%. – Krisztián Balla Sep 23 '18 at 09:07
  • @Thefourthbird doesn't seem to match those with special characters within the word. – Manu Sep 23 '18 at 09:10
  • @JennyO'Reilly I thought what you're saying was something like "3test77_data" would not match. It still is because there's a number on the first character. – Manu Sep 23 '18 at 09:13
  • I think you are ambiguous. First you say that a number may only appear as first character and then you say that any special character (excl. underscore) can follow. But that would also include numbers. – Krisztián Balla Sep 23 '18 at 09:15
  • @Manu If it either starts with a digit or an underscore OR there must be a special character in the word and the word can occur in a larger text and lookbehinds are suppoerted try [`(?<=\s|^)(?:[\d_]\S+|\S*[%@#$]\S*)(?=\s|$)`](https://regex101.com/r/hONmSf/1) – The fourth bird Sep 23 '18 at 09:18
  • 1
    @Thefourthbird Nice, it works! lookbehind is supported but I am still trying to learn them. Thank you so much! :) – Manu Sep 23 '18 at 09:31
  • @JennyO'Reilly sorry about that. I posted faster than how I gathered my thoughts on this problem. – Manu Sep 23 '18 at 09:40

2 Answers2

2

If lookbehinds are supported, you could use an alternation to match either starting with an underscore or a digit OR in the other case matching zero or more times not a whitespace character, at least a special character using a character class followed by matching zero or more times not a whitespace character again.

(?<=\s|^)(?:[\d_]\S+|\S*[%@#$]\S*)(?=\s|$)

Regex demo

Explanation

  • (?<=\s|^) Positive lookbehind to assert what is on the left is either a whitespace character or the start of the string
  • (?: Start non capturing group
    • [\d_]\S+ Match a digit or an underscore followed by matching one or more times not a whitespace character
    • | Or
    • \S*[%@#$]\S* Match zero or more times not a whitespace character followed by matching what is specified in the character class and the match zero or more times not a whitespace character again
  • ) Close non capturing group
  • (?=\s|$) Positive lookahead to assert that what follows is a whitespace character or the end of the string
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • thank you! Just a follow-up, [%@#$] this part does not cover all other special characters, right? So I need to add more if I want it to be matched. – Manu Sep 23 '18 at 09:52
  • @Manu Yes that is correct. Note that this part `\S*[%@#$]\S*` will also match a single special character due to the zero or more times repetition of `\S*`. Perhaps you could update your questions with all the requirements from the comments. – The fourth bird Sep 23 '18 at 09:53
0

if i get question right you search for a starting % and an ending % into a string. Assuming there's only one possible by string you could use indexOf and lastIndexOf looking like

function searchTagIn(symbol, str){ let chk=str.indexOf(symbol);
 if(  chk>-1){
  if(str.lastIndexOf(symbol)!=chk){
   return str.substring(chk,str.lastIndexOf(symbol);
  }
 }return;
}
jean3xw
  • 121
  • 7
  • thanks and welcome fellow newcomer! However, this isn't what I want, sorry for not being really clear. As much as possible, I am trying to match all the words with a single regex pattern without having to resort to creating a function. – Manu Sep 23 '18 at 08:07