-2

Here's an example of strings I've to treat:

<pnr>0240978840</pnr>;Amplificator AD8622ARMZ;<upa>1</upa>
<pnr>0273101670</pnr>;Power Management LT1236BIS8-10;<upa>1</upa>
<pnr>0299801750</pnr>;Power Management LT1175IS8-5;<upa>3</upa>
<pnr>0233201780</pnr>;Amplificator LT1498CS8;<upa>4</upa>

I've been seaching for a while and tried several patterns to embed the "name" of the electronic component in the tag and the following string in tag to get:

<kwd>Amplificator</kwd><adt>AD8622ARMZ</adt>

<kwd>Power Management</kwd><adt>LT1236BIS8-10</adt>

Thus, for this instance, I search: "Amplificator" but not "AD8622ARMZ" "Power Management" but not "LT1236BIS8-10".

I don't understand how to search words with letters only and not words with letters AND digits. I tried:

;(\w+)\s*(\D+) which finds ;Amplificator AD and ;Power Management LT
;(\w+)\s*([^\W+]+) which finds ;Amplificator AD8622ARMZ and ;Power Management (only !)
;(\w+)\s*([^\d]) which finds ;Amplificator A
;(\w+)\s*(\D+)\S which finds ;Amplificator A and ;Power Management LT1
;(\w+)\s*([a-zA-Z]+)[^0-9]+ which finds ;Amplificator AD and ;Power Management LT

I start to use regex since few days but it's very hard for me to understand the "rules:

Ok, \w is a word with letters and digits \d is a digit but how to express a word with letters and no digits ?

I read numerous tutos but some of them are very too hard and token used are nos as simple as written in the help...

The purpose for me is not only the answer but I really want to improve myself and understand...

Many thanks by advance for your lights. If anyone knows a tuto regex notepad++ tuto for not totally beginner, it'd be great !

Best regards

pipout64
  • 11
  • 1
  • A letter matching pattern is `[[:alpha:]]`. Or, `[a-zA-Z]` for ASCII only letters. So, to match a word with letters, without `_` or digits glued to the word, use `\b[[:alpha:]]+\b` or `\b[A-Za-z]+\b`. – Wiktor Stribiżew Jun 14 '21 at 19:00
  • Does this answer your question? [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – ti7 Jun 14 '21 at 19:02
  • Many thanks Wictor for your help. Thanks to you I just understood how to use \b for words boundaries. I tried to re-use and adapt your pattern to mu search and wrote this: `;((\b[A-Za-z]* \b)+)` (I'm proud of me because it's the first time I use nested search: thanks to @MonkeyZeus !). Best regards ! – pipout64 Jun 14 '21 at 20:40

2 Answers2

0

Notepad++ is PCRE so this would work:

regex

;((?:[a-zA-Z]+ )+)(.*?);
  • ; - capture the first semicolon
  • ( - start a capture group; $1 in the substitution
    • (?:[a-zA-Z]+ )+ - capture words with no numbers followed by a space multiple times
  • ) - close the $1 capture group; this should contain our non-numeric words
  • ( - start a capture group; $2 in the substitution
    • .*? - loosely capture everything; the ? prevents us from capturing the final semi-colon
  • ) - close the $2 capture group; this should contain our alphanumeric word
  • ; - capture the final semicolon

substitution

<kwd>$1</kwd><adt>$2</adt>

https://regex101.com/r/jgyikG/1

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
  • Thanks for your answer: it's almost the good solution ! But please, BIll gates, leave this body ! You are at another level. The problem is that I didn't understand anything. It's the first time I see nested search (if so). I don't understand why do you use $1 instead of \1 for replacement. I modified the pattern to get what I want because with your pattern, the tag disapears when replacing: Search `;((?:[a-zA-Z]+ )+)(.*?)()` Replace `$1$2$3`. Many thanks for your help. Perhaps will you take a few minutes to explain me... – pipout64 Jun 14 '21 at 20:29
  • @pipout64 I modified my answer again but chose to leave `` alone. Your desired text is between two semicolons which allowed me to simplify my regex. See explanation in my edit. – MonkeyZeus Jun 15 '21 at 12:22
  • @pipout64 I use `$1` because I like it. To me, a dollar sign signals a variable and that exactly what it is. To me, `\1` means "literally one" as if the backslash is an escape char to make the following char literal. – MonkeyZeus Jun 15 '21 at 12:24
  • Thanks again, @MonkeyZeus, for your explanations. I've understood the most part of the pattern but, in a general way, I've big troubles with the question mark. Is it what we call a lazy character ? I use it autooatically but don't know the different "locations" where I can put it. For instance, you use it at the begining with the colon just after: why ? – pipout64 Jun 18 '21 at 11:06
  • @pipout64 You should check out the regex101 link I provided and hover your mouse over the syntax which confuses you. `(?:)` is a non-capturing group. It's essentially the same as `()` but it does not create a backreference: example `$1`, `$2`, etc... It allows the regex engine to consume fewer resources and doesn't pollute which backreferences you need. – MonkeyZeus Jun 18 '21 at 12:44
0

Given your example of strings and as long as search & replace is not applied to already replaced text, there's no need to make it more complicated than

     Find what : ;(.*) (.*);
Replace with : ;<kwd>\1</kwd><adt>\2<adt>;

Armali
  • 18,255
  • 14
  • 57
  • 171