I have a problem that I confused about attacking it. This is the task:
We would like you to implement an “add to basket” button classifier on the dataset provided. Specifically, we want the classifier to identify the id attribute of the correct “add to basket” button.
It has been provided to me a dataset of 1000 website Each website has two files: 1)HTML files - each one being the DOM representation of an e-commerce product page 2)JSON files - metadata for each HTML file. The ‘button_id’ key indicates, via the HTML tag attribute “id”, where the correct “add to basket” button is on each associated HTML file.
I don't the HTLM file with the product page provided any help in this task. It is a standard product page with a lot of other information that is not relevant like contact and address of the place, language and so on
The JSON file looks like this
actions
0 "click"
augmented_tags []
button_ids
0 "0±±ui-id-1883"
context null
extended_url "https://www.neimanmarcus.com/en-gb/p/loro-piana-andre-denim-sport-shirt-prod200810299?ecid=NMAF__ShopStyle++Collective&CS_003=5630585&utm_medium=affiliate&utm_source=NMAF__ShopStyle++Collective"
features {}
html_filename "ex8560step01.large.html"
placeholders
0 null
skip_augmentation true
step 1
I need to classify which one is a good practice add basket button and those that are not.
My idea was to have the algorithm open the web page then visualize the page itself and then compared to the best practice pages that I can use a label It will be a CNN classification or ResNet If I want to be more sophisticated I don't think is an NLP matter.
I am wrong? Any ideas? If this is the case how I make the CNN focus on the position of the Add basked button.
I have been asked to do this with Neural Networks
Thanks