I have text data to be cleaned using regex. However, some words in the text are immediately followed by numbers which I want to remove.
For example, one row of the text is:
Preface2 Contributors4 Abrreviations5 Acknowledgements8 Pes terminology10 Lessons learnt from the RUPES project12 Payment for environmental service and it potential and example in Vietnam16 Chapter Integrating payment for ecosystem service into Vietnams policy and programmes17 Chapter Creating incentive for Tri An watershed protection20 Chapter Sustainable financing for landscape beauty in Bach Ma National Park 24 Chapter Building payment mechanism for carbon sequestration in forestry a pilot project in Cao Phong district of Hoa Binh province Vietnam26 Chapter 5 Local revenue sharing Nha Trang Bay Marine Protected Area Vietnam28 Synthesis and Recommendations30 References32
The first word in the above text should be 'preface' instead of 'preface2' and so on.
line = re.sub(r"[A-Za-z]+(\d+)", "", line)
This, however removes the words as well as seen:
Pes Lessons learnt from the RUPES Payment for environmental service and it potential and example in Chapter Integrating payment for ecosystem service into Vietnams policy and Chapter Creating incentive for Tri An watershed Chapter Sustainable financing for landscape beauty in Bach Ma National Park 24 Chapter Building payment mechanism for carbon sequestration in forestry a pilot project in Cao Phong district of Hoa Binh province Chapter 5 Local revenue sharing Nha Trang Bay Marine Protected Area Synthesis and
How can I capture only the numbers that immediately follow words?