3

I am working on detecting similarity between 2 videos in Java. The user will suggest two videos, and software has to detect whether they are similar by checking the file content. I read that it is possible to compare each frame of the 2 videos. Can anyone please share any suitable algorithms (or code or methods) that can be implemented in Java?

john_science
  • 6,325
  • 6
  • 43
  • 60
Raji A C
  • 2,261
  • 4
  • 26
  • 31
  • 7
    This question is too vague for a proper answer. Can you break it down to what you have tried so far and where exactly you are having a problem? – Vincent Ramdhanie Mar 27 '12 at 17:15
  • Similar in what way? Do you mean content-wise as in (mathematically speaking) their cross-section is non-empty (they share scenes that are exactly equal)? Semantically as in they are about the same topic? Maybe it would also help to give examples of videos that should be considered similar as well as a counterexamples that might intuitively be considered similar, but your program is supposed to consider different? – G. Bach Mar 27 '12 at 17:20
  • actually i want to know the method for detecting whether 2 videos are similar or not.i have chosen 2 mpeg files – Raji A C Mar 27 '12 at 17:20
  • yes,similar in content wise.i think if we are taking contents of these 2 videos in to byte arrays they will be same,if these 2 videos are same – Raji A C Mar 27 '12 at 17:22
  • So you want to detect on a frame-to-frame basis whether each frame of video A is similar to video B based on the pixels it has, correct? You could slice both videos up into frames, find an offset time at which both videos have similar frames (for example, video A at 00:00:23 may be similar to video B at 00:15:21) and proceed from there using the methods suggested in the link thkala provided; would that do what you want? – G. Bach Mar 27 '12 at 17:31
  • This seems like the sort of question that could require an entire book to answer. – Louis Wasserman Mar 27 '12 at 17:35
  • how can i find similar frames ,that is what i want actually,i think in a video it is a little difficult – Raji A C Mar 27 '12 at 17:54

1 Answers1

10

There is a huge variety of algorithms for determining similarity in images. A search for image similarity algorithm and video similarity algorithm in Google Scholar will produce a large number of related papers - there are also a few questions (e.g. this one) here on StackOverflow.

A couple of important aspects that should be noted:

  • There is no universal definition of similarity - you need to define it with regard to your specific purpose. For example, an image with a red square and an image with a blue square could be considered similar because both have squares, or entirely dissimilar based on the color difference.

  • Similarity is not generally defined in absolute terms i.e. as something that either exists or not. Most similarity algorithms produce a relative indicator that has to be correlated with a baseline to produce meaningful results. For example, if you have a corpus of images depicting squares of various colors, you might get high similarity values in absolute terms, but it's the minute differences caused by the color changes that should be focused on.

Disclaimer: before using any algorithm found through a search engine, you should investigate its legal status. Video similarity is a rather hot research area and quite a few algorithms are probably encumbered by patents and such. Using them for academic research might be acceptable, but anything else you should ask a lawyer about...

EDIT:

I am not certain what you need, but I can offer a few general tips:

  • Investigate if the video metadata, such as length and resolution, may be useful. For example, would it make sense to actually compare the content of a 30-second clip to a 3-hour film?

  • Consider if you can get away with using image-based similarity on a random sample of corresponding frames from the same timestamps in each file. Examining each and every frame in detail would probably be a waste of time and CPU cycles in most cases.

  • Consider using a tiered similarity measurement architecture, where simpler and less-expensive methods are used to weed out the obvious cases, before the real CPU hogs step in. For example, computing the average color and other simple metrics for a frame is probably much easier than contour detection or face recognition.

That said, I do not believe that will be able to get a definite answer here. You will have to experiment and see what approaches work best for your actual use cases...

Community
  • 1
  • 1
thkala
  • 84,049
  • 23
  • 157
  • 201
  • thank you,sorry for the vague description.as per your answer i think i have to take snapshots from these two videos and detect whether they are similar in content using a canny edge detection algorithm or something like that.i want to know whether there is an algorithm developed exclusively for video content similarity detection.i have searched a lot but failed to find an answer – Raji A C Mar 27 '12 at 17:30
  • i found out an image characteristic code for each frame and find those frames with similar image characteristic code,and the result is calculated as percentage of similarity..Thanks for your help.. – Raji A C May 28 '12 at 04:32