1

I have hundreds of bib references in a file, and they have the following syntax:

@article{tabata1999precise,
  title={Precise synthesis of monosubstituted polyacetylenes using Rh complex catalysts. 
Control of solid structure and $\pi$-conjugation length},
  author={Tabata, Masayoshi and Sone, Takeyuchi and Sadahiro, Yoshikazu},
  journal={Macromolecular chemistry and physics},
  volume={200},
  number={2},
  pages={265--282},
  year={1999},
  publisher={Wiley Online Library}
}

I would like to title case (aka Proper Case) the journal name in Notepad++ using regular expression. For example, from Macromolecular chemistry and physics to Macromolecular Chemistry and Physics.

I am able to find all instances using:

(?<=journal\=\{).*?(?=\})

but I am unable to change the case via Edit > Convert Case to. Apparently it doesn't work on find all and I have to go one by one.

Next, I tried recording and running a macro but Notepad++ just hangs indefinitely when I try to run it (option to run until the end of the file).

So my question is: does anyone know the replace regex syntax I could use to change the case? Ideally, I would also like to use "|" exclusions for particular words such as " of ", " an ", " the ", etc. I tried to play with some of the examples provided here, but I was not able to integrate it into my look-aheads.

Thank you in advance, I'd appreciate any help.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
mk1138
  • 55
  • 4

2 Answers2

2

This works for any number of words:

  • Ctrl+H
  • Find what: (?:journal={|\G)\K(?:(\w{4,})|(\w+))(\h*)
  • Replace with: \u$1\E$2$3
  • CHECK Wrap around
  • CHECK Regular expression
  • Replace all

Explanation:

(?:             # non capture group
    journal={     # literally
  |              # OR
    \G            # restart from last match position
)               # end group
\K              # forget all we have seen until this position
(?:             # non capture group
    (\w{4,})      # group 1, a word with 4 or more characters
  |              # OR
    (\w+)         # group 2, a word of any length
)               # end group
(\h*)           # group 3, 0 or more horizontal spaces

Replacement:

\u          # uppercased the first letter of the following
  $1        # content of group 1
\E          # stop the uppercased
$2          # content of group 2
$3          # content of group 3

Screenshot (before):

enter image description here

Screenshot (after):

enter image description here

Toto
  • 89,455
  • 62
  • 89
  • 125
  • Very nice solution ++ – The fourth bird Jul 18 '20 at 08:58
  • `\G` must have a start of file position subtracted. You say "*`\G` # restart from last match position*" - it is wrong as `\G` matches either start of string or end of the previous successful match. Hence, your regex may find a match at the start of string, not in between two defined strings. – Wiktor Stribiżew Jul 18 '20 at 10:49
  • Thank you very much Toto, for both the syntax and especially the detailed explanation. I'm studying it and the first thing I notices when trying it around is that search is stopped by an punctuation marks. I tried commas, dashes, periods, etc. I'll try to figure it out. – mk1138 Jul 18 '20 at 15:43
  • @mk1138: Just replace `\h*` with `[\h,\-.]*` at the end of regex and tell me if it works. – Toto Jul 18 '20 at 15:48
  • It does. And from what I can see it's easy to expand it with other characters. I tried [\h,\-.~|<]* and some other characters and it works just fine. Thank you one more time. This is a great lesson. – mk1138 Jul 18 '20 at 23:18
1

if the format is always in the form:

journal={Macromolecular chemistry and physics},

i.e. journal followed by 3 words then use the following:

Find: journal={(\w+)\s*(\w+)\s*(\w+)\s*(\w+)

Replace with: journal={\u\1 \u\2 \l\3 \u\4

You can modify that if you have more words to replace by adding more \u\x, where x is the position of the word.

Hope it helps to give you an idea to move forward for a better solution.

enter image description here

\u translates the next letter to uppercase (used for all other words)

\l translates the next letter to lowercase (used for the word "and")

\1 replaces the 1st captured () search group

\2 replaces the 2nd captured () search group

\3 replaces the 3rd captured () search group

Mohsen Alyafei
  • 4,765
  • 3
  • 30
  • 42