I am creating a text-to-speech feature in a react app that would be able to highlight the currently spoken word by putting a background behind it.
The feature is very similar to the Firefox reader view.
The solution that I implemented just cuts the paragraph string and puts a span around the spoken word at each rendering, making it heavy on resources and impossible to animate.
Here is the code : (Which I intend to scrap)
export interface SpeakEvent {
start: number;
end: number;
type: string;
}
export default function TextNode({ content }: TextNodeProps) {
const [highlight, setHighlight] = useState<SpeakEvent | null>(null);
useEffect(() => {
registerText((ev) => {
if (ev?.type === 'word' || !ev)
setHighlight((old) => {
/* Irrelevant code */
return ev;
});
}, content);
}, [content]);
const { start, end } = highlight ?? {};
let segments = [content];
if (highlight) {
segments = [
segments[0].slice(0, start),
segments[0].slice(start, end),
segments[0].slice(end),
];
}
return (
<>
{segments.map((seg, i) =>
i === 1 ? (
<span key={i} className={'highlight'}>
{seg}
</span>
) : (
seg
)
)}
</>
);
}
The Firefox reader is using a smarter way to do this. It uses a div placed behind the spoken word which is then moved around :
The div containing the highlighting effect is directly placed using absolute coordinates.
How can they access a word's bounding rectangle within a paragraph, while only knowing the string's index ?