0

Is there a way (either a trained model or a deterministic function) in python3 that returns the length of numbering in a title. For example,

"I. This is a big title" ---> length=len("I.")=2
"1.10 This a small title" ---> length=len("1.10")=4
"A)b) This is another title" ---> length=len("A)b)")=4
"C.2 This is a regular title" ---> length=len("C.2")=3
"This is not a title" ---> length=0
etc....

?

I wrote a little function that uses regex to detect if a string begins with a numbering:

pattern = r'(^IX|IV|VI{0,3}|I{1,3})(\s|-|\s-|\)|\s\)|\.|\s\.|/|\s/|–|\s–)'
m_romans = re.search(pattern, text)
m_letters = re.search(r'^([a-zA-Z])(\s|-|\s-|\)|\s\)|\.|\s\.|/|\s/|–|\s–)', text)
m_digits = re.search(r'^(\d)(\s|-|\s-|\)|\s\)|\.|\s\.|/|\s/|–|\s–)', text)

Maybe regex can help ?

dada
  • 1,390
  • 2
  • 17
  • 40
  • This comes down to writing a regex, or something functionally equivalent to a regex, for detecting a "numbering pattern". There are already lots of regex tutorials, so any additional help will simply be walking you through making explicit what you think a "numbering pattern" is. – Acccumulation Jul 19 '18 at 15:09

2 Answers2

2

If the numbering is always at the start and separated with a space.

len(title.split()[0])

should work.

On second thought, perhaps you can do title.split()[0] and check that result with your regex. If it satisfies your definition of titles, check the length, otherwise return 0.

AsheKetchum
  • 1,098
  • 3
  • 14
  • 29
0

If you try with something like that using regex first to detect numbers

Return positions of a regex match() in Javascript?

Rodrigo Espinoza
  • 380
  • 2
  • 17