6

I want to get images of Discogs releases. Can I do it without Discogs API? They don't have links to the images in their db dumps.

user5869792
  • 71
  • 1
  • 3

2 Answers2

6

To do this without the API, you would have to load a web page and extract the image from the html source code. You can find the relevant page by loading https://www.discogs.com/release/xxxx where xxxx is the release number. Since html is just a text file, you can now extract the jpeg URL.

I don't know what your programming language is, but I'm sure it can handle String functions, like indexOf and subString. You could extract the html's OG:Image content for picture.

So taking an example: https://www.discogs.com/release/8140515

Compare page at : https://www.discogs.com/release/8140515 with extracted URL image below.

VC.One
  • 14,790
  • 4
  • 25
  • 57
  • **note :** You might have to fine-tune those index Pos numbers. eg: You might change from **+19** to **+21** in order to cut off the quotation marks etc (**if needed** by your coding tool). You'll figure it out when testing... – VC.One Feb 20 '16 at 04:21
  • Trying to fetch images of many releases, won't Discogs block automatic access? – Collector Feb 20 '16 at 10:25
  • @Collector, I don't think so (unless you can show otherwise). Access was not blocked for any of my testing AS3 code or PHP code. Each loaded 5 images just to check paths are parsed correctly. – VC.One Feb 21 '16 at 16:25
  • 2
    Okay. The question was to get images without API. I believe I showed a good / correct answer for that. As for 5000 pics, that's a new detail. I'm not a server expert. I can only suggest you pace it out to fly under the radar, cos I can imagine 5000 requests from same IP address **at once** will look suspicious & be IP blocked. An "all day, everyday" site-user could access 5000 images spread over a week & wont be blocked so y'know... pace it out. – VC.One Feb 23 '16 at 00:14
0

This is how to do it with Java & Jsoup library.

  • get HTML page of the release
  • parse HTML & get <meta property="og:image" content=".." /> to get content value
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class DiscogRelease {

    private final String url;

    public DiscogRelease(String url) {
        this.url = url;
    }

    public String getImageUrl() {
        try {
            Document doc = Jsoup.connect(this.url).get();
            Elements metas = doc.head().select("meta[property=\"og:image\"]");
            if (!metas.isEmpty()) {
                Element element = metas.get(0);
                return element.attr("content");
            }
        } catch (IOException ex) {
            Logger.getLogger(DiscogRelease.class.getName()).log(Level.SEVERE, null, ex);
        }
        return null;
    }

}
alexandre-rousseau
  • 2,321
  • 26
  • 33