1

After preparing my LaTeX bibliography in .bib file, I discovered that there's an issue with capitalisation.

According to: this information, solution is to add brackets to each word in each title (as I checked, add brackets to whole title doesn't work).

For example, I wish to change from:

title   = "What a interesting title",
title= "What a boring title",
title="What a crazy title",

to:

title   = "{What} {a} {interesting} {title}",
title= "{What} {a} {boring} {title}",
title="{What} {a} {crazy} {title}",

so:

title <any number of spaces> = <any number of spaces> " <words in title> ",

should be replaced by:

title <any number of spaces> = <any number of spaces> " <{Each} {word} {in} {title} {should} {be} {in} {bracket}> ",

I'm trying to do that by Regex in Python but have no idea what is wrong.

My code:

re.sub(r'(title[\s-]*=[\s-]*\")(\b(\w+)\b)',r'\1{\2}',line)

add brackets to the first word only.

Community
  • 1
  • 1
matandked
  • 1,527
  • 4
  • 26
  • 51

2 Answers2

1

This uses negative lookahead on the first part of the string:

>>> import re
... s = """title   = "It's an interesting title",
... title= "What a boring title",
... title="What a crazy title","""
... print(re.sub(r'(?!title\s*=\s*")\b(\S+)\b',r'{\1}',s))
title   = "{It's} {an} {interesting} {title}",
title= "{What} {a} {boring} {title}",
title="{What} {a} {crazy} {title}",

See http://regex101.com/r/hL2lE6/6

Update: Avinash Raj made a good point about special characters that could appear in titles, like apostrophes, so I changed \w+ to \S+ and updated the example text to test it.

Note: If your titles include words ending with a special character and that character needs to be included in the brackets, see here for a solution: http://regex101.com/r/hL2lE6/11

It uses (?!title\s*=\s*")\b([^"=\s]+). But, your main concern was capitalization so it may not matter. In that case I recommend keeping it simple and sticking with \S+.

twasbrillig
  • 17,084
  • 9
  • 43
  • 67
0

It couldn't be possible through re module. But you could achieve this through external regex module like below.

>>> import regex
>>> s = '''title   = "What a interesting title",
title= "What a boring title",
title="What a crazy title",'''
>>> print(regex.sub(r'(?m)((?:^title\s*=\s*"|\G) *)([^"\s\n]+)', r'\1{\2}',s))
title   = "{What} {a} {interesting} {title}",
title= "{What} {a} {boring} {title}",
title="{What} {a} {crazy} {title}",

DEMO

\G assert position at the end of the previous match or the start of the string for the first match. \G forces the pattern to only return matches that are part of a continuous chain of matches.

References:

Community
  • 1
  • 1
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Thanks, both answers are very helpful. I forgot to mention that I need to add brackets *only* to lines which contains "title" string. But I will do that by simply checking if "title" in line: – matandked Nov 29 '14 at 22:08