If I search for something on Google News, I can click on the "Explore in depth" button and get the same news article from multiple sources. What kind of algorithm is used to compare articles of text and then determine that it is regarding the same thing? I have seen the Question here:
Is there an algorithm that tells the semantic similarity of two phrases
However, using methods mentioned there, I feel that if there were articles that were similar in nature but regarding different stories, they would be grouped together using the methods mentioned there. Is there a standard way of detecting Strings that are about the same thing and grouping them, while keeping Strings that are just similar separate? Eg. If I search "United States Border" I might get stories about problems at the USA's border, but what would prevent these from all getting grouped together? All I can think of is the date of publication, but what if many stories were published very close to each other?