1

I want to build an app that will recognize what emojis have been used on the wallpaper.

So for instance this app will receive on input:

enter image description here

And on output should array of names of recognizing emojis return:

[
  "Smiling Face with Sunglasses", 
  "Grinning Face with Smiling Eyes", 
  "Kissing Face with Closed Eyes"
]

Of course, the names of these emojis will come from the names of files of training images. For example this file:

enter image description here

It will be called Grinning_Face_with_Smiling_Eyes.jpg

I would like to use AWS Rekognition Label or Google AutoML Vision, but they require a minimum of 10 images of each emoji for training. As you know, I can only provide one image of each emoji, because there is no more option, they are in 2D ;)

Now my question is: What should I do? How can I skip these requirements? Which service should I choose?

PS. In real business instead of emojis, there are covers of the books, which AI has to recognize. There is also one image per book-cover photo in 2D.

dosad
  • 151
  • 7
  • I hear you say that there is only one image for each imoji, but maybe not. What about the same image rotated or offset from center. For example, if you rotate the image by 30o, you would have 12 images for each emoji thereby satisfying the apparent requirement of 10 images for each recognizable item. – Kolban Dec 26 '21 at 06:03
  • I have been thinking, that AI automatically rotates each image, cut in half, add white-black filter etc. – dosad Dec 26 '21 at 08:09
  • It likely does ... but I think you are trying to get around the statement here ... https://cloud.google.com/vision/automl/docs/prepare which says that you should have 10 images per label. But you don't. You are telling me that you have 1 image per label. So either try 1 image ... per label and see if takes ... or ... make up another 9 similar images ... and one way would be to rotate or shift the original image. – Kolban Dec 26 '21 at 15:44
  • To add to kolban comment, you should stick with the best [practice guide](https://cloud.google.com/vision/automl/object-detection/docs/prepare#best_practice_guide) for images, I think you can reuse the images to be recognize and fit your training as recommended by the image. ( I think you can put in different backgrounds and keep the emoji or have an emoji cut by a small amount to be recognize as such, just to meet the training req. ). And start testing your model recognition. Here is an official [sample list](https://cloud.google.com/vision/automl/docs/samples) for AutoML. – Betjens Dec 27 '21 at 11:05
  • @Betjens @Kolban, I have tried to train the model. I have rotated each emoji 12 times, crop each emoji by half vertically and horizontally. So rotated emojis were for training, cropped emojis horizontally for testing, and cropped emojis vertically for validation. After that, I get only `0.35` precision. What can I do more? – dosad Dec 27 '21 at 12:42
  • Have you perform the appropriate evaluation of your model? Here is a small [guide](https://cloud.google.com/vision/automl/docs/evaluate) If its too low you should be adding more data and running small samples when it starts to distort so you can adjust the data you are providing to fit your model. ( things like resolution also counts ) – Betjens Dec 27 '21 at 14:24
  • Here are some recommendations to improve the accuracy of your images: 1. Stick to [recommended values](https://cloud.google.com/vision/docs/supported-files#image_sizing) for formats and sizing. 2. Use perspective transform with opencv. 3. Binarize the image and use black and white images (not color RGB images) . 4. Increase the contrast and sharpness of the image. 5. You image resolution should be at least 300 DPI. – Betjens Dec 28 '21 at 14:46
  • Also you have to properly create the dataset with your images, and label it properly. You can follow this [guide](https://cloud.google.com/vision/automl/docs/create-datasets). – Betjens Dec 28 '21 at 15:48
  • @Betjens I have done the image augmentation, and there is a progress to 0.8 precision, but it only recognize wallpaper with one emoji. If we paste two or more emojis into one wallpaper, it cannot recognize them. I have heard that I should train AI on prepared wallpapers, but there is over 1200 emojis, so it will be very hard to prepare them. What can I do more? – dosad Dec 30 '21 at 13:33
  • Well that's the hard part on feature engineering. Each emoji and scenario should be consider. You should star with only one or 2 emojis and using pics to feed your data. The effort should be similar to recognize face on image, with emojis showing partially that will indicate its one. I recommend reading this [article](https://towardsdatascience.com/fast-feature-engineering-in-python-image-data-5d3a8a7bf616). – Betjens Jan 03 '22 at 18:43
  • @WytrzymałyWiktor nope :( – dosad Jan 12 '22 at 14:11

0 Answers0