0

I have two scripts which listen to the same websocket, and write the received packages to a database. In principle, these scripts receive the same packages, but there might be downtime et cetera. I would now like to merge the two streams into one reliable stream, removing duplicates.

However, the packages are not timestamped or id'd, so by just looking at the packages, it is not possible to establish which one came first. It cannot be ruled out that some of the packages are intentionally identical. The packages are timestamped when they arrive at the servers.

Is there a standard, principled approach to solve this problem?

Pete L.
  • 101
  • 1

1 Answers1

0

I would suggest solving this problem with the diff algorithm. The answers at Diff Algorithm? may help you understand how to implement that.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • @500-InternalServerError From the diff you can classify packages as picked up by one, or the other, or both. – btilly Apr 28 '20 at 17:12