I am unable to build a Regex that matches all possible strings of the format
{\some_text}
I tried to build myself a Regex but I was unable make it match ALL kinds of characters.
What I came up with: r"\{\\(.*)\}"
This did not work properly, it only matched {\~some_string}
This is what I am trying to achieve:
text = "F.N. Freitas, C. Singulani, G. Vila-Verde, Linea Science Server,: The Dark Energy Survey Data Release 2. Ap._J._Supp._Ser. 255, (2021).Alam S., A. de Mattia, A. Tamone, S. {\' A}vila, J.A. Peacock, V. Gonzalez-Perez, A. Smith, A. Raichoor, A.J. Ross, J.E. Bautista, E. Burtin, J. Comparat, K.S. Dawson, H. du Mas des Bourboux, S. Escoffier, H. Gil-Mar{\'\i}n, S. Habib, K. Heitmann, J. Hou, F.G. Mohammad, E.M. Mueller, R. Neveux, R. Paviot, W.J. Percival, G. Rossi, V. Ruhlmann-Kleider, R. Tojeiro, M. Vargas Maga{\~n}a, C. Zhao, G.B. Zhao: The completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: N-body mock challenge for the eBOSS emission line galaxy sample. Mon._Not._R._Astron._Soc. 504, (2021).Alam S., J.A. Peacock, D.J. Farrow, J. Loveday, A.M. Hopkins: Using GAMA to probe the impact of small-scale galaxy physics on nonlinear redshift-space distortions. Mon._Not._R._Astron._Soc. 503, (2021).Alam S., M. Aubert, S. Avila, C. Balland, J.E. Bautista, M.A. Bershady, D. Bizyaev, M.R. Blanton, A.S. Bolton, J. Bovy, J. Brinkmann, J.R. Brownstein, E. Burtin, S. Chabanier, M.J. Chapman, P.D. Choi, C.H. Chuang, J. Comparat, M.C. Cousinou, A. Cuceu, K.S. Dawson, S. de la Torre, A. de Mattia, V.S. Agathe, H.M. des Bourboux, S. Escoffier, T. Etourneau, J. Farr, A. Font-Ribera, P.M. Frinchaboy, S. Fromenteau, H. Gil-Mar{\'\i}n, J.M. Le Goff, A.X. Gonzalez-Morales, V. Gonzalez-Perez, K. Grabowski, J. Guy, A.J. Hawken, J. Hou, H. Kong, J. Parker, M. Klaene, J.P. Kneib, S. Lin, D. Long, B.W. Lyke, A. de la Macorra, P. Martini, K. Masters, F.G. Mohammad, J. Moon, E.M. Mueller, A. Mu{\~n}oz-Guti{\'e}rrez, A.D. Myers, S. Nadathur, R. Neveux, J.A. Newman, P. Noterdaeme, A. Oravetz, D. Oravetz, N. Palanque-Delabrouille, K. Pan, R. Paviot, W.J. Percival, I. P{\'e}rez-R{\`a}fols, P. Petitjean, M.M. Pieri, A. Prakash, A. Raichoor, C. Ravoux, M. Rezaie, J. Rich, A.J. Ross, G. Rossi, R. Ruggeri, V. Ruhlmann-Kleider, A.G. S{\'a}nchez, F.J. S{\'a}nchez, J.R. S{\'a}nchez-Gallego, C. Sayres, D.P. Schneider, H.J. Seo, A. Shafieloo, A. Slosar, A. Smith, J. Stermer, A. Tamone, J.L. Tinker, R. Tojeiro, M. Vargas-Maga{\~n}a, A. Variu, Y. Wang, B.A. Weaver, A.M. Weijmans, C. Y{\`e}che, P. Zarrouk, C. Zhao, G.B. Zhao, Z. Zheng: Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey: Cosmological implications from two decades of spectroscopic surveys at the Apache Point Observatory. Physical_Review_D 103, (2021).Alam S., N.P. Ross, S. Eftekharzadeh, J.A. Peacock, J. Comparat, A.D. Myers, A.J. Ross: Quasars at intermediate redshift are not special; but they are often satellites. Mon._Not._R._Astron._Soc. 504, (2021).Alonso-Herrero A., S. Garc{\'\i}a-Burillo, S.F. H{\"o}nig, I. Garc{\'\i}a-Bernete, C. Ramos Almeida, O. Gonz{\'a}lez-Mart {'hallo}"
encodings = {
"'": u'\u0300',
"'\\": u'\u0301',
"^": u'\u0302',
"~": u'\u0303',
"o": u'\u00D8',
"ss": 'ß'
}
# remove the encoding and replace it with its corresponding character
def repl(m):
string = m.group()
get_open_bracket_idx = string.find('{')
get_close_bracket_idx = string.find('}')
encoding = substring.substringByChar(
string, startChar=string[get_open_bracket_idx + 1], endChar=string[get_close_bracket_idx - 2])
string_content = string[get_close_bracket_idx - 1]
string_and_encoding = encoding + string
string_content = encodings.get(encoding, string_content) + string_content
print()
print(f'encoding: {encoding}')
print(f'string content: {string_content}')
print()
return string_content
# This nearly works, it just matches {'some_text} which it shouldnt
changed_text = re.sub(r'\{\\?[^{}]*}', repl, text)
print(changed_text)