The problem is: I have a data set to clean. I am currently using Python 3.6 as intrepreter in PyCharm(community edition) to work on this.
I need to:
- Find a line where the word "Code" appears and
- paste all the following lines in a single line together until
- the next word "Code" comes
This is would essentially break the data into 2 fields ,namely; Code and details of the company.
The final output needs to be in a table in a text file or csv written through Pycharm itself and this format is critical.
The following is the input(extract from actual textfile) :
345- Code # 98882 +
"Ms, ABDUL RAFAY & COMPANY, +"
"907, 2nd Floor, tradeway Centre,33, Block-6, PECHS, Karach +"
Ph:345598 1334 558106 +
Mr. Abdul rafay Siddiqui +
347 Code # 96663 +
"Ms. BILAL & BROTHERS Plot No.F-8, Estate #2, Lalazar, Karachi Ph:322575.84 +"
Mr. Mubarak Shahid +
A23 - Code : BO229 +
"Ms. RAHMAN & SONS 303, 3rd Floor, Square One, Dundas street, Karachi P:36268947 +"
"Mr, Saleem Mughal +"
"349- Code # 93369 Ms, ALIAPPAREL +"
"Office No. 491/307, 1st Floor, Blessings Tower near Tipu Burger , P?:34990456 +"
"Mr, Nasir Wali +"
The output should be like this :
Code - Company details
345- Code # 98882 + -"Ms, ABDUL RAFAY & COMPANY, +""907, 2nd Floor, tradeway Centre,33, Block-6, PECHS, Karach +"Ph:345598 1334 558106 +Mr. Abdul rafay Siddiqui +
347 Code # 96663 + - "Ms. BILAL & BROTHERS Plot No.F-8, Estate #2, Lalazar, Karachi Ph:322575.84 +"Mr. Mubarak Shahid +
The key to the data is that the company details are sometimes in one line or two or three .So if there could be a way to iterate over these till the next 'Code' appears. I had tried this before in R but couldnt come up with anything concrete excepting adding + which could be stripped off here.