0

I'd like to find an efficient way to extract some sort of color-palette (a list, or something else) from a given page-url with python. What I want is to take all background's colors, colors of the titles and of all the other elements.

I've already seen here [Build a color palette from image URL ] that it is possible to take a palette from an image, but what about a page?

Willy
  • 31
  • 5
  • It is not an easy task.Dynamic content of pages (and advertisement) makes all more difficult. You may need to check https://stackoverflow.com/questions/1587637/light-weight-renderer-html-with-css-in-python and then you should convert the page into an image. – Giacomo Catenazzi Apr 30 '19 at 14:26

2 Answers2

0

Did it with selenium mixing with your example above. Below example show how to get top ten colors from Google's search.

Just screenshot the webpage with a web crawler and then process the image

#!/bin/env python3
from selenium import webdriver
import numpy as np
from PIL import Image

def palette(img):
    """
    Return palette in descending order of frequency
    """
    arr = np.asarray(img)
    palette, index = np.unique(asvoid(arr).ravel(), return_inverse=True)
    palette = palette.view(arr.dtype).reshape(-1, arr.shape[-1])
    count = np.bincount(index)
    order = np.argsort(count)
    return palette[order[::-1]]

def asvoid(arr):
    """View the array as dtype np.void (bytes)
    This collapses ND-arrays to 1D-arrays, so you can perform 1D operations on them.
    http://stackoverflow.com/a/16216866/190597 (Jaime)
    http://stackoverflow.com/a/16840350/190597 (Jaime)
    Warning:
    >>> asvoid([-0.]) == asvoid([0.])
    array([False], dtype=bool)
    """
    arr = np.ascontiguousarray(arr)
    return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))


def savePrint(imageFile):
    driver = webdriver.Firefox()
    driver.get("https://google.com.br")    
    driver.get_screenshot_as_file(imageFile)

imageFile = '/tmp/tmp.png'
savePrint(imageFile)
img = Image.open(imageFile, 'r').convert('RGB')
print(palette(img)[:10])
Yuri Santos
  • 179
  • 7
  • I was searching for something which can report both the palette and the name of the elements (say, for example, "all h2 are yellow", "div id_div" has blue background"...), but this is a good starting point! – Willy Apr 30 '19 at 08:18
  • It's possible to iterate over all elements and make the same processes described above, but dont know if it's the best solutions, since it's gonna to take too much screenshots e would be a performance killer. – Yuri Santos May 01 '19 at 03:13
0

I've tried the following, which worked for me: the idea is to access the page source with selenium, then I search for all the strings starting with '<' and put them cleaned in a list, by removing the '<' from the start. Then I iterate the list and for each one I use value_of_css_property and search for background-color, border-color, color, background-image. I know this is not perfect but it does what I was looking for. Don't forget to remove duplicates form the tag list (since this method will give a list of all the css-color properties of each tag). Example:

url ="someurl"
options = webdriver.ChromeOptions()
options.headless = False
driver = webdriver.Chrome(options=options)
driver.get(url)
list_tags = []
html_source = driver.page_source
txt = re.findall(r'<[a-zA-Z]+', html_source)
for x in txt:
    list_tags.append(x.replace('<', ''))
list_tags = list(dict.fromkeys(list_tags))
final_list = []

for i in list_tags:
 tag = driver.find_elements_by_tag_name(i)
 tag_back_col = []
 tag_col = []
 tag_img = []
 tag_border = []
 for j in tag:
      back_col = j.value_of_css_property('background-color')
      tag_back_col.append(back_col)
      col = j.value_of_css_property('color')
      tag_col.append(col)
      bord = j.value_of_css_property('border-color')
      tag_border.append(bord)
      img = j.value_of_css_property('background-image')
      tag_img.append(img)
  final_list .append((i, tag_back_col, tag_col, tag_border, tag_img))
driver.close()

The final list will be a list of tuples with the tag name and the lists of backgrounds-colors, colors, border-colors and background-image for each occurrence of that tag in the page.

Willy
  • 31
  • 5