3

I am currently working on image sitemap generation in sitecore.So i need all the images used in a particular url of a website.

Here i need to get the details of all the items where media item is used.. or else i need to find what are all the media items(images) used in an item(url) in sitecore.

I have tried to get the image field from an item and it is working fine but what i need is to get all the images that are used in the item which are added through presentation details.

 Item currentitem = master.GetItem("/sitecore/content/International/Cars/New models/All new XC90");
 public static string GetImageURL(Item currentItem)
        {
            string imageURL = string.Empty;
            Sitecore.Data.Fields.ImageField imageField = currentItem.Fields["Image"];
            if (imageField != null && imageField.MediaItem != null)
            {
                Sitecore.Data.Items.MediaItem image = new Sitecore.Data.Items.MediaItem(imageField.MediaItem);
                imageURL = Sitecore.StringUtil.EnsurePrefix('/', Sitecore.Resources.Media.MediaManager.GetMediaUrl(image));
            }
            return imageURL;
        }
bala3569
  • 10,832
  • 28
  • 102
  • 146

3 Answers3

3

Since a page is made up of multiple components you would need to iterate through those, retrieve all the datasourced items and check the field values. Don't forget that images can also be placed in Rich Text fields as well.

In order to ensure that you capture all of these, you may be better making a WebClient call back to the site, essentially scraping the rendered HTML and then using HTMLAgilityPack/FizzlerEx/CsQuery to return all the images. You can filter then to only ones from the media library or a particular location if required.

using HtmlAgilityPack;
using Fizzler.Systems.HtmlAgilityPack;

//get the page
HtmlWeb web = new HtmlWeb();
HtmlDocument document = web.Load("http://example.com/requested-page");
HtmlNode page = document.DocumentNode;

//loop through all images on the page
foreach(HtmlNode item in page.QuerySelectorAll("img"))
{
    var src = item.Attributes["src"].Value;
    // do some stuff
}

If you only want to get images referenced from the Media Library then you can restrict the query:

foreach(HtmlNode item in page.QuerySelectorAll("img[src^='/-/media/']"))
{
    //do stuff
    ...
}
Community
  • 1
  • 1
jammykam
  • 16,940
  • 2
  • 36
  • 71
  • Thank you.. i will try this – bala3569 Jan 07 '16 at 09:35
  • @bala3569 Keep in mind that depending on your setup you can have different media prefix (e.g. `~/media`), using web client will not allow you to grab personalized content easily and if you have any links which include full hostname, you need to update the filter. – Marek Musielak Jan 07 '16 at 11:52
  • @jammykam How to get video urls alone? – bala3569 Jan 07 '16 at 12:04
  • @bala3569 If the videos live in a specific area of the media library then you could specific that instead, e.g. `page.QuerySelectorAll("img[src^='/-/media/videos']")`. Or use an "endswith" selector `img[src$='.mp4']` – jammykam Jan 07 '16 at 14:17
  • An alternative approach would be to [check all Items that refer to a particular media item](https://sdn.sitecore.net/Snippets/Item%20Handling/Checking%20References/get%20all%20Items%20that%20refer%20to%20a%20particular%20Item.aspx). if there is a specific set of media – jammykam Jan 07 '16 at 14:22
1

As jammykam pointed out, a page may be composed by multiple components. However, making a live request of the html may not always be optimal.

An alternative solution could be using Sitecore ContentSearch. You can create a stored computed field that contains a list of all images on the page item. That would be much faster to extract during run time and you can spend some more CPU cycles to get an accurate list of images during index time.

The computed index field could be a list of guids (media itme ids) or the image urls or any custom format that suits your needs.

During index time, you can use the LinkDatabase to find referenced items and filter out the media items you need. Thereby, you'll get images referenced from any field, including embedded images in rich text fields.

As mentioned previously, you can perform those operations for both the item itself and referenced items used by the page layout. You can traverse the item reference list that you get from item.Visualization.GetRenderings

mikaelnet
  • 654
  • 3
  • 10
0

Traversing all pages in sitecore is pretty heavy task, also it results in getting unwanted images like logo, other header images. You should consider adding 'Sitemap Images' tree list field on page templates to contain all relevant images for the page.

Maksim Shamihulau
  • 1,219
  • 1
  • 15
  • 17