Can I match bold text using regular expression?

Question

I've a text blow and I want to match All the text in bold. So without depending on prefix i.e serial numbers, Can I match just bold characters using Regular Expressions?

Spalding, K.L., Buchholz, B.A., Bergman, L.E., Druid, H., Frisén, J.: Forensics: e age written in teeth by nuclear tests. Nature 437(7057) (2005) 333–334
Lovecraft, H.P.: HP Lovecraft: Tales: Tales. Library of America (2005)
Duncan, R.: A survey of parallel computer architectures. Computer 23(2) (1990) 5–16
Santos, N., Hoshino, Y.: Global distribution of rotavirus serotypes/genotypes and its implication for the development and implementation of an effective rotavirus vaccine. Reviews in medical virology 15(1) (2005) 29–56
DIARRHOEA, R.: Rotavirus and other viral diarrhoeas. Bulletin of the World Health Organization 58(2) (1980) 183–198
Barton, T.: Power and knowledge: astrology, physiognomics, and medicine under the Roman Empire. University of Michigan Press (2002)
Gauquelin, M.: The cosmic clocks: From astrology to a modern science. H. Regnery Company (1967)

No. However, with regex you can match things like `word`. You'll need to explain how your text exactly looks like. — Bart Kiers, Mar 13 '18 at 08:29
How would you recognize bold text? Is this some sort of markup-language? Regular expressions match *text* - i.e. a sequence of characters.... — piet.t, Mar 13 '18 at 08:30
@piet.t Yes. I agree. I asked you this question with a curiosity. — Kishore Kumar Korada, Mar 13 '18 at 08:39
@Cœur It's regex in general. Please find the source in here(I think you'll have to create an account to view each level) http://play.inginf.units.it/#/level/12 — Kishore Kumar Korada, Mar 14 '18 at 11:35
@KishoreKumarKorada OK then I removed the tag *nsregularexpression* which is for Apple code only. And I gave you a simple answer. — Cœur, Mar 14 '18 at 11:37

Tamas Rev · Accepted Answer · 2018-03-13T09:35:29.710

You can create a regex that groups the authors into the first group:

^(?:\d+\. )([^:]*)

Explanation:

(?:...) is a non-capturing group
^ is line start
\d+\. matches one more more numbers, a dot and a space
(...) is a capturing group
[^:]* matches everything that's not a colon

If you want to make sure to match only the right lines, you can add a lookahead to the end of the regex: (?=:). So the regex would be ^(?:\d+\. )([^:]*)(?=:)

Demo here.

This approach is okay because it works with any number of digists. On the other hand, this is exactly why we can't use lookbehinds.

If you're willing to make assumptions, i.e. there can be 1..4 digits in the beginning, then you can use this:

((?<=^\d{1}. )|(?<=^\d{2}. )|(?<=^\d{3}. )|(?<=^\d{4}. ))([^:]*)(?=:)

Explanation:

(?<=^\d{3}. ) is a fixed length lookbehind for 3 digits from the beginning of the line
(...|...|...) is for alternative, fixed length lookbehinds. A bit verbose, I know. The lookbehinds, however, are not part of the match.
([^:]*) matches and captures the non-colon characters
(?=:) a lookahead for a colon. So we match the right lines only, but do not capture the colon

Demo here.

Update

To match only the first author, we need to do a slight change: The capturing group should be ([^:,]*,[^:,]*), and the lookahead to finish the line should be (?=[:,]). So this is how the capturer regex looks like:

^(?:\d+\. )([^:,]*,[^:,]*)(?=[:,])

Demo here.

And this is how it looks like with lookbehinds:

((?<=^\d{1}. )|(?<=^\d{2}. )|(?<=^\d{3}. )|(?<=^\d{4}. ))([^:,]*,[^:,]*)(?=[:,])

Demo here.

Explanation: [^:,]*,[^:,]* is the trick to match an author. Each author has only one comma in their name, so we use a negative character class zero or more times: [^:,]*, then match one comma, and them the same negative character class zero or more times.

You will see that there are still some exceptions, e.g. at

I appreciate your answer and your effort in writing this. But in the demo you've mentioned, specified expression matches one or more authors in a row where I'm expecting it to be matched only with the bold one. How can I match what I only want? — Kishore Kumar Korada, Mar 13 '18 at 09:18
Oh, so only the first author. So the group must end a the first comma or colon. Then the group should be `([^:.]*)` and the positive lookahead should be `(?=[:,])`. Updating the answer accordingly. — Tamas Rev, Mar 13 '18 at 09:21
I had to do one more change to match the `,` from the first authors name. I think it should be okay now. — Tamas Rev, Mar 13 '18 at 09:29
Awesome. This is what I trying to get and failed. I though I could match with strong/em tags. Thank you — Kishore Kumar Korada, Mar 13 '18 at 09:33

Cœur · Answer 2 · 2018-03-15T06:39:09.080

1

I can identify this common pattern on each line in your example:

digits + a dot + a space
(text + comma + text) in bold
a comma or colon + anything

solution 1

With a non-capture operator, this translates to:

^(?:\d+\. )([^,]*,[^,:]*)

demo

solution 2

Alternative by replacing the non-capture operator with the look-behind operator:

(?<=\d\. )([^,]*,[^,:]*)

demo

solution 3

To explicitly solve http://play.inginf.units.it/#/level/12, then you need the OR operator:

(?<=^.. |^... |^.... )([^,]*,[^,:]*)

demo

edited Mar 15 '18 at 06:39

answered Mar 14 '18 at 11:30

Cœur

37,241
25
195
267

Thank you. It's working. What if I want to just match the same without serial number and space after it. i.e instead of this "144. Spalding, K.L." , this "Spalding, K.L." – Kishore Kumar Korada Mar 15 '18 at 05:02
@KishoreKumarKorada `(?:...)` is a non-capturing operator, so I'm not group matching your serial number (see https://stackoverflow.com/questions/3512471/…). But you can use the look-behind operator `(?<=...)` as an alternative for full matching, and the OR operator `(...|...)` to solve your challenge. – Cœur Mar 15 '18 at 06:46

score -1 · Answer 3 · edited Jul 12 '18 at 12:35

-1

my solution

(?<=^\d+\.\s)(\w+,[\s\w\.]*)

edited Jul 12 '18 at 12:35

Zoe

27,060
21
118
148

answered Jul 12 '18 at 11:17

MonStar

102
1
2
14

Can I match bold text using regular expression?

3 Answers3

Update

solution 1

solution 2

solution 3

Linked