I find it really hard to believe that you had zero computer vision knowledge in this course just to be assigned a fully loaded computer vision assignment. Regardless, since you are simply looking for directions, then these are my recommendations:
For starters, your video feed has random dots which act as noise. Read up morphological operations to get rid of them first. Why? A clean video = higher accuracy.
You are right that hough line can be used for detection. But the next stage is differentiating between the green and blue one. This blog is a good starter on how to go about doing it.
At this point, we have a clean feed with the lines detected respectively. The next task is character recognition where this answer post has a couple recommendations you can explore. You can peek into this and this as well. The second post uses scikit and the standard MNIST dataset. I would recommend you use the second one because the digits in your video feed seem like they're from MNIST.
With the digits detected, you need to find the intersection between the digit contour and a line segment. Consider implementing this suggestion.
Two Cents:
You seriously do not need to train a neural network for this. Why call the fireman to put a match out?
After digit detection, you might want to consider tracking it. Tracking is always less expensive than detection. Ideally, you run detection at the initialization stage, then followed by tracking. Afterwards you simply track then run detection after say 10-20 frames (obviously depending on the application).
If you TRULY haven't been taught ANY computer vision, BUCKLE UP for this.
Cheers :)