Save output data into text file (Each line in a separate row)

Question

I want to save my output data into the text file where each new line is shown in a different row. Currently each row is delimited by \n, I want new lines to be saved in different rows.

from PIL import Image 
import pytesseract 
import sys 
from pdf2image import convert_from_path 
import os 



PDF_file = "F:/ABC/Doc_1.pdf"

pages = convert_from_path(PDF_file, 500) 
image_counter = 1

for page in pages: 
    filename = "page_"+str(image_counter)+".jpg"
    page.save(filename, 'JPEG') 
    image_counter = image_counter + 1

filelimit = image_counter-1
outfile = "F:/ABC/intermediate_steps/out_text.txt"


f = open(outfile, "a") 

for i in range(1, 2): 

    filename = "page_"+str(i)+".jpg"
    import pytesseract 
    pytesseract.pytesseract.tesseract_cmd = r"\ABC\opencv-text-detection\Tesseract-OCR\tesseract.exe"
    from pytesseract import pytesseract
    text = str(((pytesseract.image_to_string(Image.open(filename)))))  
    text = text.replace('-\n', '')   
    #text = text.splitlines()
    f.writelines("Data Extracted from next page starts now.")
    f.writelines(str(text.encode('utf-8')))

f.close()

For eg :-

ABC
DEF
GHI

Current output :-

ABC\nDEF\nGHI\n

@m02ph3u5, i want extracted output to be saved in a text file where each new row is not shown as delimited by **\n** , but each new line is saved in a different row without \n, please see i have included an image in the question. I hope it helps. — gaurav2141, Jul 28 '19 at 15:42
What are the exact contents of `text`? Also, why do you use `writelines` instead of `write` if it's just a string? — m02ph3u5, Jul 28 '19 at 15:43
@m02ph3u5 writelines and write, none of them are working for me. — gaurav2141, Jul 28 '19 at 16:13

score 1 · Accepted Answer · answered Jul 28 '19 at 16:01

1

When you do

f.writelines(str(text.encode('utf-8')))

You convert the newline byte \n to its escaped version \\n. You should use just

f.writelines(text)

answered Jul 28 '19 at 16:01

herculanodavi

228
2
12

If i dont encode then it throws an error :UnicodeEncodeError: 'charmap' codec can't encode character '\ufb01' in position 0: character maps to – gaurav2141 Jul 28 '19 at 16:11
You could try [this](https://stackoverflow.com/questions/27092833/unicodeencodeerror-charmap-codec-cant-encode-characters) – herculanodavi Jul 29 '19 at 17:07

Save output data into text file (Each line in a separate row)

1 Answers1