I am trying to scan a text from an Ocr and clean it, I got a character that is divided to few lines, however I would like to have the text in similar to the way it is in the image
the code :
heraclitus<-"greek.png"
library(tidyverse)
library(tesseract)
library(magick)
image_greek<-image_read(heraclitus)
image_greek<-image_greek %>% image_scale("600") %>%
image_crop("600x400+220+150") %>%
image_convert(type = 'Grayscale') %>%
image_contrast(sharpen = 1) %>%
image_write(format="jpg")
heraclitus_sentences<-magick::image_read(image_greek)%>%
ocr() %>% str_split("\n")
As you can see from the output, I have white spaces and sentences that are divided to two lines. I would like to have it in a vector or a list, that each element will be a sentence