0

I am working with tesseract library and want my text from an image to be in a single line, without new lines("\n").

I tried to use variable.replace("\n"," "), but it is not working. It just gives me the same multi line response.

Below is my code:

img = Image.open('maaan.jpg')
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
kt = pytesseract.image_to_string(img,lang='eng')
kt.replace("\n", " ")
print(kt)

Thanks for the help btw :D

Heikki
  • 2,214
  • 19
  • 34
Nisox
  • 1
  • 1
  • 7
  • You should inspect your text in a hex editor then. Chances are you have other kind of line breaks in the text, such as ``\r``. – Mike Scotty Feb 08 '19 at 08:47

2 Answers2

2

Write \n as raw string, like :

my_variable = my_variable.replace(r"\n"," " )

try:

print(kt.replace(r"\n"," ") 

Or,

kt = kt.replace(r"\n", " ")
print(kt)
Taohidul Islam
  • 5,246
  • 3
  • 26
  • 39
  • it sadly doesn't work for me for some reason it only works when i use print(kt.replace("\n"," ") but what if i want to replace another char? – Nisox Feb 08 '19 at 08:57
  • @Nisox, Please see updated answer. Hope, it will help you. – Taohidul Islam Feb 08 '19 at 08:58
  • thanks it helped print(kt.replace) works fine but i will need to figure how to convert multiple characters that way ! – Nisox Feb 08 '19 at 09:02
0

kt.replace("\r", "\t")

Visit this link, it has more explanation: python convert multiline to single line

Tony Okoth
  • 157
  • 1
  • 4