0

I'm trying to create a script for Linux that will detect where the text cursor is. This should be done in maximum 1 second. In order to implement this, the best solution seems to be to programmatically add some text via xdotool, take a screenshot via some other utility, try to figure out the position of that text, and then remove the text we've inserted using xdotool again.

I tried inserting a random string (like <-- CURSOR HERE). Using Tesseract 4 it takes about 20 seconds to find the position of the string, although it's very precise in terms of pixel coordinates. I was not able to use whitelisting (in version 4 of Tesseract) to narrow result to specific letters or digits only, which I assume would speed up processing.

I don't know what font the user will be using, but every font has dashes and slashes, so I could create some sort of shape (for instance, |/\|/\|/\|/\|), and use some library to detect that shape. What would be a good choice?

I don't care about what's on the rest of the screen: it could be more text, images, etc. I only need o know where my random string is (<-- CURSOR HERE, |/\|/\|/\|/\|, or can you think of anything else), and get its X/Y position in pixels.

nkkollaw
  • 1,947
  • 1
  • 19
  • 29
  • Do you mean this sort of thing? https://stackoverflow.com/a/3591679/2836621 – Mark Setchell Nov 20 '18 at 11:32
  • Hi, no. I'm not trying to get the mouse position, but the text cursor position. The thing that blinks while you type, just to be clear. Since it's dependent on the framework used, it's not an easy problem to solve. – nkkollaw Nov 20 '18 at 11:37
  • Do you know what editor/tool/program the user is using when you want the cursor? – Mark Setchell Nov 20 '18 at 11:39
  • 1
    Rather than using OCR (Tesseract) maybe you could grab the screen first, display your text and then grab it again. You could then difference the grabs very fast in OpenCV and they will be black where nothing has changed. You could then use OCR on just the changed areas if necessary to distinguish between your text and something like a clock that has changed time. – Mark Setchell Nov 20 '18 at 11:43
  • That runs in under 1s on my Mac with an 8 megapixel (3840x2160) display and unoptimised code - be careful not to save the screen capture to JPEG - use a lossless format. – Mark Setchell Nov 20 '18 at 12:59
  • @Mark Setchell: very clever (or very dumb my current attempt). I don't know if it would qualify, but if you add that as an answer I'll accept it. Of course, being a screenshot of a screen there could be other things that change (for instance, the clock in a panel), but if adding text is fast is a fraction of a second and it should matter I guess. – nkkollaw Nov 20 '18 at 16:54

0 Answers0