Repair .bib file titles through Regex

Question

After preparing my LaTeX bibliography in .bib file, I discovered that there's an issue with capitalisation.

According to: this information, solution is to add brackets to each word in each title (as I checked, add brackets to whole title doesn't work).

For example, I wish to change from:

title   = "What a interesting title",
title= "What a boring title",
title="What a crazy title",

to:

title   = "{What} {a} {interesting} {title}",
title= "{What} {a} {boring} {title}",
title="{What} {a} {crazy} {title}",

so:

title <any number of spaces> = <any number of spaces> " <words in title> ",

should be replaced by:

title <any number of spaces> = <any number of spaces> " <{Each} {word} {in} {title} {should} {be} {in} {bracket}> ",

I'm trying to do that by Regex in Python but have no idea what is wrong.

My code:

re.sub(r'(title[\s-]*=[\s-]*\")(\b(\w+)\b)',r'\1{\2}',line)

add brackets to the first word only.

I guess that something is wrong within my regex pattern especially in (\b(\w+)\b), but I don't understand what/how to correct this. — matandked, Nov 29 '14 at 12:46
What's your expected output for this `title = "What a interesting title BAR:foo", barfoo` input? — Avinash Raj, Nov 30 '14 at 05:51
@AvinashRaj that input is invalid for LaTeK so it isn't relevant. matandked is talking about setting values for `title`, don't lose sight of that. — twasbrillig, Nov 30 '14 at 05:56

twasbrillig · Accepted Answer · 2014-11-30T06:29:39.140

This uses negative lookahead on the first part of the string:

>>> import re
... s = """title   = "It's an interesting title",
... title= "What a boring title",
... title="What a crazy title","""
... print(re.sub(r'(?!title\s*=\s*")\b(\S+)\b',r'{\1}',s))
title   = "{It's} {an} {interesting} {title}",
title= "{What} {a} {boring} {title}",
title="{What} {a} {crazy} {title}",

See http://regex101.com/r/hL2lE6/6

Update: Avinash Raj made a good point about special characters that could appear in titles, like apostrophes, so I changed \w+ to \S+ and updated the example text to test it.

Note: If your titles include words ending with a special character and that character needs to be included in the brackets, see here for a solution: http://regex101.com/r/hL2lE6/11

It uses (?!title\s*=\s*")\b([^"=\s]+). But, your main concern was capitalization so it may not matter. In that case I recommend keeping it simple and sticking with \S+.

score 0 · Answer 2 · edited May 23 '17 at 12:05

It couldn't be possible through re module. But you could achieve this through external regex module like below.

>>> import regex
>>> s = '''title   = "What a interesting title",
title= "What a boring title",
title="What a crazy title",'''
>>> print(regex.sub(r'(?m)((?:^title\s*=\s*"|\G) *)([^"\s\n]+)', r'\1{\2}',s))
title   = "{What} {a} {interesting} {title}",
title= "{What} {a} {boring} {title}",
title="{What} {a} {crazy} {title}",

DEMO

\G assert position at the end of the previous match or the start of the string for the first match. \G forces the pattern to only return matches that are part of a continuous chain of matches.

References:

Thanks, both answers are very helpful. I forgot to mention that I need to add brackets *only* to lines which contains "title" string. But I will do that by simply checking if "title" in line: — matandked, Nov 29 '14 at 22:08

Repair .bib file titles through Regex

2 Answers2