0

I want to use python regex to find the document class in a latex document.

A latex file contains \documentclass{myclass} somewhere near the top. I want to find myclass using regex.

This is what I've tried so far:

latex_text = "blank  /documentclass{myclass} words, more text /documentclassdoc{11} more words"
s=re.search(r'/documentclass{(?P<class_name>.*)}', latex_text)

It matches: myclass} words, more text /documentclassdoc{11

How can I change it to only match myclass. It should also stop searching after it finds a match, as the document can get quite long.

I know the file should only have one documentclass, but I want to handle the case where there is more than 1 as well.

Kritz
  • 7,099
  • 12
  • 43
  • 73

1 Answers1

0
import re
latex_text = "blank  /documentclass{myclass} words, more text /documentclassdoc{11} more words"
print(re.search(r'/documentclass\{(.*?)\}', latex_text).group())
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • Thanks! I changed it slightly to get the class name: `print(re.search(r'/documentclass\{(?P.*?)\}', latex_text).group('class_name'))` – Kritz Apr 21 '18 at 06:59
  • If you plan to do more heavy-handed manipulation or searches in the future, TexSoup is an option; just wanted to throw that hat into the ring. https://github.com/alvinwan/texsoup. Here, we could do `soup = TexSoup(r"blank \documentclass{myclass} words,..."); soup.documentclass.name`. Disclaimer: I wrote TexSoup but it was for slightly more tasks and thought it might be useful. – Alvin Wan May 03 '18 at 09:15