I use Books app on my phone to read. I highlight a lot of stuff and then when I'm done reading, I move all the highlights to a PKM application. When I'm transferring the highlight, the Book app automatically attaches the citation to every note. For example,
“The NCI’s trials would be systematic: every trial would test a crucial piece of logic or hypothesis and produce yes and no answers. The trials would be sequential: the lessons of one trial would lead to the next and so forth—a relentless march of progress until leukemia had been cured. The trials would be objective, randomized if possible, with clear, unbiased criteria to assign patients and measure responses. ”
Excerpt from: The Emperor of All Maladies Siddhartha Mukherjee This material may be protected by copyright.“Like cancer cells, mycobacteria—the germs that cause tuberculosis—also became resistant to antibiotics if the drugs were used singly. Bacteria that survived a single-drug regimen divided, mutated, and acquired drug resistance, thus making that original drug useless. To thwart this resistance, doctors treating TB had used a blitzkrieg of antibiotics—two or three used together like a dense pharmaceutical blanket meant to smother all cell division and stave off bacterial resistance, thus extinguishing the infection as definitively as possible. But could two or three drugs be tested simultaneously against cancer—or would the toxicities be so forbidding that they would instantly kill patients? As Freireich, Frei, and Zubrod studied the growing list of antileukemia drugs, the notion of combining drugs emerged with growing clarity: toxicities notwithstanding, annihilating leukemia might involve using a combination of two or more drugs.”
Excerpt from: The Emperor of All Maladies Siddhartha Mukherjee This material may be protected by copyright.“The butcher shop”
And so on.
I want to remove these repeating lines from the big corpus of all the highlights using Python. Can someone help me in doing this?
I created a text file and tried to use the readline() method to loop through the entire file. But that didn't work. Even if it did work, I don't know how to loop through the entire file, remove specific, repeating bits and arrange them back again with proper formatting.