I have a set of french strings like this one:
text = "Français Langues bantoues Presse écrite Gabon Particularité linguistique"
I want to extract the substrings starting with capitalized letters into a list, as follows:
list = ["Français", "Langues bantoues", "Presse écrite", "Gabon", "Particularité linguistique"]
I did try something like that but it does not take the following words, and it stops because of the french symbols.
import re
pattern = "([A-Z][a-z]+)"
text = "Français Langues bantoues Presse écrite Gabon Particularité linguistique"
list = re.findall(pattern, text)
list
Output
['Fran', 'Langues', 'Presse', 'Gabon', 'Particularit']
I did not manage to find the solution on the forum unfortunately.