I'm trying to create a script for Linux that will detect where the text cursor is. This should be done in maximum 1 second. In order to implement this, the best solution seems to be to programmatically add some text via xdotool
, take a screenshot via some other utility, try to figure out the position of that text, and then remove the text we've inserted using xdotool
again.
I tried inserting a random string (like <-- CURSOR HERE
). Using Tesseract 4 it takes about 20 seconds to find the position of the string, although it's very precise in terms of pixel coordinates. I was not able to use whitelisting (in version 4 of Tesseract) to narrow result to specific letters or digits only, which I assume would speed up processing.
I don't know what font the user will be using, but every font has dashes and slashes, so I could create some sort of shape (for instance, |/\|/\|/\|/\|
), and use some library to detect that shape. What would be a good choice?
I don't care about what's on the rest of the screen: it could be more text, images, etc. I only need o know where my random string is (<-- CURSOR HERE
, |/\|/\|/\|/\|
, or can you think of anything else), and get its X/Y position in pixels.