I am working with text files that are radiology reports. If a document has two pages there is a block of text containing the patient name and other metadata that is repeated at the top of all the pages with the rest of the page containing the contents of the report. I have merged the pages into a single text object. Keeping the first block I want to remove all the other repeating blocks. Is there a way to remove these blocks programmatically from all such files? The repeating blocks look something like this:
Patient ID xxx Patient Name xxx
Gender Female Age 43Y 8M
Procedure Name CT Scan - Brain (Repeat) Performed Date 14-03-2018
Study DateTime 14-03-2018 07:10 am Study Description BRAIN REPEAT
Study Type CT Referring Physician xxx