1

What is the regex to match any title case strings but only in markdown headings (starting with # ?

This regex matches all title case words:

(\b[A-Z][a-z]+)

Unfortunately also outside heading in normal sentences.

Sample matches (bold):

FirstWordInHeading another Word
test Sentence. Test sentence

This regex only matches the first word in a heading but not the others:

(?:^#+\s)(\b[A-Z][a-z]+)

Sample match:

FirstWordInHeading

I'd like to match also any other title case word. In this case, also the string "Word" from the first example.

Christian
  • 4,902
  • 4
  • 24
  • 42
  • Try `^#+\s*[A-Z][a-z]*`. – Wiktor Stribiżew Aug 03 '22 at 22:34
  • Hmm, "Word" is not matched. This is exactly my problem :) – Christian Aug 03 '22 at 22:37
  • So, `^#+\s*((?:[A-Z][a-z]*)+)` – Wiktor Stribiżew Aug 03 '22 at 22:42
  • This does still not match any other title case word after the first, for example,: `## test1 Test2 Test3` or `### Test1 Test2 Test3` - do I need another capture group? – Christian Aug 03 '22 at 23:11
  • Maybe `^#+[^\S\n]*((?:\w+[^\S\n]+)*[A-Z]\w*(?:[^\S\n]+\w+)*)` – Wiktor Stribiżew Aug 03 '22 at 23:11
  • huh, quickly getting complex. Now, it matches everything, also whitespaces and lowercase strings in the markdown headings. – Christian Aug 03 '22 at 23:12
  • Yes, because `test1` is lowercase in `## test1 Test2 Test3`. So what do you want? My regex requires at least one capitalized "word". – Wiktor Stribiżew Aug 03 '22 at 23:19
  • Thanks so much for your help. At least I understand this part `^#+[^\S\n]*` :) So sorry that my question details were not concise enough. The regex should match any title case word in a heading, independent from its position. It basically needs to skip any lower case word(s), doesn't matter if the first, midde, or last. I guess the position thing is the most complex topic. – Christian Aug 03 '22 at 23:23
  • Ok, do it in two steps: 1) extract the heading, 2) extract capitalized words. You have all you need for that now. More concrete help can be rendered only if you provide details about what regex flavor, programming language you are using. – Wiktor Stribiżew Aug 03 '22 at 23:26
  • Would you provide any positive/negative feedback on my regex? @WiktorStribiżew – lemon Aug 03 '22 at 23:44
  • @WiktorStribiżew Thanks again. The idea was to use a regex to convert headings from title case to sentence case, either using Ruby or simply search and replace in Visual Studio code. I am at the very beginning with my research and actions. Regex is pretty hard for me but now I have at least an idea how to start. – Christian Aug 04 '22 at 00:00

1 Answers1

0

Try with the following regex:

(?<=#).*?([A-Z]\S+)|(?!^)\G.*?([A-Z]\S+)

It will match:

  • (?<=#): after any # symbol
  • .*?: the least amount of characters, before..
  • ([A-Z]\S+): any upper case letter followed by any combination of non-space characters (Group 1)

or (|)

  • (?!^)\G: beginning from the last match
  • .*?: the least amount of characters, before..
  • ([A-Z]\S+): any upper case letter followed by any combination of non-space characters (Group 2)

Group 1 will contain your first match. Group 2 will contain further matches.

Check the demo here.

lemon
  • 14,875
  • 6
  • 18
  • 38