I have a java project which it will read text from pdf files. The pdf contain tabular format which will contain breakline if the column span the text content. Eg: "This is www.google.com" become "This is www.goog/nle.com" (spanned to next line). I will need to extract this text out and process it using domain regex pattern. It won't get a proper "www.google.com" if it spanned. I couldn't replace the "/n" as I might have scenario eg: "This is an This is www.google.com/nwww.yahoo.com".
*This pdf file is converted from a docx which if java read from docx it is getting www.google.com fine without the breakline issue. It happen only in pdf.
Any thought? Thanks