1

I am an amateur in video/image processing but I am trying to create an app for HD video calling. I hope someone would see where I may be doing wrong and guide me on the right path. Here is what I am doing and what I think I understand, please correct me if you know better.

  1. I am using OpenCV currently to grab an image from my webcam in a DLL. (I will be using this image for other things later)
  2. Currently, the image that opencv gives me is a Opencv::Mat. I resized this and converted to a byte array size of a 720p image, which is about 3 Megapixels.
  3. I pass this ptr back to my C# code then I can now render this onto a texture.
  4. Now I created a TCP socket and connect the server and client and start to transmit previously gotten image byte array. I am able to transmit the byte array over to the client then I use the GPU to render it to a texture.
  5. Currently, there is a big delay of about 4-500ms delay. This is after I tried compressing the buffer with gzipstream for unity. It was able to compress the byte array from about 3 million bytes to 1.5 million. I am trying to get this to smallest as possible and also fastest as possible but this is where I am completely lost. I saw that Skype requires only 1.2Mbps connection for a 720p video calling at 22 fps. I have no idea how they can achieve such a small frame, but of course I don't need it to be that small. I need to be at least decent.

Please give me a lecture on how this can be done! And let me know if you need anything else from me.

Jie Wei
  • 51
  • 1
  • 10
  • I have many questions. How are you loading the image from C++? Are you running the socket code in the main Thread? How do you resize the image? If you really don't show the relevant code to these questions then it's impossible to help you since we can't tell what's causing it to slow down. – Programmer Jul 25 '18 at 23:15
  • Hey programmer, I am using your code for TCP data transfer by the way. I couldn't ask questions on that thread because I don't have enough rep. The thread I am talking about is here: https://stackoverflow.com/questions/42717713/unity-live-video-streaming. What do you mean by loading the image from C++? The image stored in a intptr on C++ then Marshal back into C#. I am displaying the image onto the texture with GPU like this tutorial: https://docs.unity3d.com/Manual/NativePluginInterface.html – Jie Wei Jul 25 '18 at 23:24
  • 1
    But since you're using Opencv::Mat to obtain the camera data, most of the things in the code are now different...Note that I am compressing to jpeg to reduce the size before sending. It looks like you're using GZipStream. At this point, I can't tell what changed much until I see part of the code – Programmer Jul 25 '18 at 23:25
  • In my C++ I just have a function that gets the image and pass it back to C# with a pointer http://codepad.org/r1etRmPE. The TCP transfer code is pretty much the same other than using gzipstream to compress. Here is the example of the server http://codepad.org/Lkh0WjjS. Once the client receives the image, it will call back to a DLL function to render the image to a registered texture2d. – Jie Wei Jul 25 '18 at 23:41
  • I am trying out zlib compression in C++. I think doing compression and decompression will be much faster in DLL than doing it in C#. Do you think this should be done in another thread? – Jie Wei Jul 27 '18 at 18:33
  • Should I not be using OpenCV for this task? and just grab the texture once it is rendered then send over to client after encoded in JPEG? The thing is I also require packing a depth map later to the image. I was thinking about replacing the alpha channel with the depth map. – Jie Wei Jul 27 '18 at 18:44
  • I am not making a commercial app but just a demo app to try out remote assistant AR app. I will definitely try out this and get back to you with the result. Thanks for your help! – Jie Wei Jul 27 '18 at 18:57
  • Hey Programmer, I was able to do the above and get the image back into Unity. The performance is very good and lag is minimal. Compared with Skype video speed it is still about 10x more but it is good enough for my project. I guess that's why they make the big bucks :P – Jie Wei Aug 10 '18 at 18:00
  • All of the UDP and image reading parts of the code is in C++ now and that's why I can get it to 10x. I think #1 is pretty easy to do but #2 is definitely a lot more challenging... How to send part of the frame and having the receiver know where that piece belongs. I will have to research this a bit. – Jie Wei Aug 10 '18 at 18:33
  • Oh great. Yeah I am still not super familiar with OpenCV. I will definitely start looking there. Thanks a lot for your help! Will report back results soon :) Oh BTW, doesn't solving #2 basically solves #1 as well? – Jie Wei Aug 10 '18 at 18:37
  • *"Oh BTW, doesn't solving #2 basically solves #1 as well?"* Yes. I mentioned #2 so that if you can't do #2, you will stick with #1 since that's easier. #2 is much better and should be used when necessary. If you can do #2, don't worry about #1. – Programmer Aug 10 '18 at 18:57
  • Thanks, I will attempt #2 first and see how it goes. – Jie Wei Aug 10 '18 at 19:03
  • Good luck and you're welcome. Let me know how it goes when you ever finish. – Programmer Aug 10 '18 at 19:09
  • Hello Programmer, I was able to grab and show each of the contours with OpenCV. These contours have x,y,w,h. I should be able to grab all the pixels of each contour then writing them to buffer but I feel like this may not be that efficient. I debugged that sometimes the sizes of these frames are larger than the whole encoded JPG if each pixel is a byte. Should I be encoding these frames into a single JPG or compressing the buffer some other way? – Jie Wei Aug 13 '18 at 21:05
  • 1
    For now, for simplicity, use JPG. You can use other advanced encoding later on once you get it working. Once you extract the pixel that changed, encode to JPG. Send the JPG with the x,y,w,h. Once you receive it, retrieve the x,y,w,h then the JPG array. Create `cv:Mat` then **[load](https://stackoverflow.com/a/50302473/3785314)** the received JPG to it. Use `Mat.copyTo` to copy the new mat to the old frame. You will need to use `cv::Rect` to specify which part of the image to be copied. The `cv::Rect` should be created from the received x,y,w,h. – Programmer Aug 14 '18 at 10:17
  • 1
    You can find the example for copying or updating the frame [here](http://answers.opencv.org/question/58414/how-to-replace-part-of-the-image-with-another-image-roi/?answer=58417#post-id-58417). Note that when sending x,y,w,h with the JPG,it would be easier to create a class to hold them then serialize them. On the other side just de-serialize them. Don't use json. Just binary serializing.See [this](https://stackoverflow.com/a/44816991/3785314) post for more info on how to do this in C++. When you run into issues,make sure to debug the x,y,w,h and verify that you're receiving the correct values – Programmer Aug 14 '18 at 10:37
  • Very helpful! I was able to extract the pixels and tested the output. Though I think currently there is too much noise so I am creating too many contours. Working on serializing it right now. Do you think the each serialized object can be transferred one by one or should I pack them up before sending them? – Jie Wei Aug 14 '18 at 22:21
  • There seems to be a lot of variables that need to be tweaked... It is very slow to extract and update the parts that have changed. – Jie Wei Aug 14 '18 at 22:31
  • How slow? Use a timer to time extracting it. Use another timer to time updating it. Do this for only one frame only. Let me know the outcome result – Programmer Aug 14 '18 at 22:40
  • It takes about 3-4 milli seconds to do both crop and copyTo per frame. It doesn't seem that much even if you multiply this by 30 frames. But the resulting image looks like this: http://i1376.photobucket.com/albums/ah18/email2jie/FrameUpdate_zpsr69a2dmx.jpg The code for it is here http://codepad.org/exVxvf3o. – Jie Wei Aug 14 '18 at 23:12
  • You're on the right track. Unfortunately, I don't have enough time to do this to see why it's behaving like that. I suggest you put the code that extracts the image into a function then the code that re-assembles it back into frame into another function then create a question with that code. Make sure to explain that you want to be able to extract a changed pixel and use it to update another image. I think that many people can help you or even provide better ideas on how to accomplish this with OpenCV – Programmer Aug 14 '18 at 23:33
  • By the way, 3-4 milli seconds is not bad. You can improve that later on. Just focus on getting it to work for now. Another thing I suggest you do is to check if % amount of pixel changed. If the percent is too much or within some threshold, send the complete frame instead. I think that should fix the problem in the image issue you uploaded. – Programmer Aug 14 '18 at 23:35
  • I have tried that already. I had a check to see if contours.size() is a certain size. I removed it to test the speed of the whole thing. It helped but didn't fix the issue. I feel like the threshold isn't giving all the correct changed spots. – Jie Wei Aug 14 '18 at 23:39
  • *"I feel like the threshold isn't giving all the correct changed spots."* This is what I suspect too.Another possible issue is that the pixel are not being positioned where they are supposed to. Although it look like what you said. I do think you need to change the way you're detecting the image pixels. That's mostly the issue. Have you tried saving each individual pixel detected and comparing them? – Programmer Aug 14 '18 at 23:45
  • 1
    I suggest you take a camera, take a picture of a table, put two items on the table then take another picture. Make sure to use a camera stand to take the picture from one position. Use these two pictures to test your code. This will help you troubleshoot your code. It will help you show which part is failing. You can use `imshow` to see if your pixel change detector code is working properly. Add more items to the table until you run into issues. This is better than using the camera to test this – Programmer Aug 14 '18 at 23:48
  • Maybe I should just take two static images to test with instead of using live video. – Jie Wei Aug 14 '18 at 23:53
  • That's what I meant. – Programmer Aug 14 '18 at 23:56

1 Answers1

1

I found a link that may be very useful to anyone working on something similar. https://www.cs.utexas.edu/~teammco/misc/udp_video/ https://github.com/chenxiaoqino/udp-image-streaming/

Jie Wei
  • 51
  • 1
  • 10
  • For anyone wanting to use this for windows, you will have to link Ws2_32.lib and use winsock2.h in PracticalSocket.cpp instead. – Jie Wei Aug 10 '18 at 17:53