I want to get images of Discogs releases. Can I do it without Discogs API? They don't have links to the images in their db dumps.
2 Answers
To do this without the API, you would have to load a web page and extract the image from the html source code. You can find the relevant page by loading https://www.discogs.com/release/xxxx
where xxxx
is the release number. Since html is just a text file, you can now extract the jpeg URL.
I don't know what your programming language is, but I'm sure it can handle String functions, like indexOf
and subString
. You could extract the html's OG:Image
content for picture.
So taking an example: https://www.discogs.com/release/8140515
- Find the
.indexOf("og:image\" content=\");
save asstartPos
to some integer. - That's 19 chars so next do a
.indexOf(".jpg", startPos + 19);
into aendPos
.
This gets the first occurence of .jpg after index of startPos + 19 any other chars. Now extract a subString from html text
img_URL = myHtmlStr.substring(startPos+19, endPos);
You should end up with a string reading like this below (extracted URL):
https://img.discogs.com/_zHBK73yJ5oON197YTDXM7JoBjA=/fit-in/600x600/filters:strip_icc():format(jpeg):mode_rgb():quality(90)/discogs-images/R-8140515-1460073064-5890.jpeg.jpgThe process can be shortened to finding the startPos index of
https://img.
, then find first occurrence of.jpg
when searching from after that startPos index. Extract within that length range. This is because the image URL is only mentioned in the html source athttps://img.
Compare page at : https://www.discogs.com/release/8140515 with extracted URL image below.

- 14,790
- 4
- 25
- 57
-
**note :** You might have to fine-tune those index Pos numbers. eg: You might change from **+19** to **+21** in order to cut off the quotation marks etc (**if needed** by your coding tool). You'll figure it out when testing... – VC.One Feb 20 '16 at 04:21
-
Trying to fetch images of many releases, won't Discogs block automatic access? – Collector Feb 20 '16 at 10:25
-
@Collector, I don't think so (unless you can show otherwise). Access was not blocked for any of my testing AS3 code or PHP code. Each loaded 5 images just to check paths are parsed correctly. – VC.One Feb 21 '16 at 16:25
-
2Okay. The question was to get images without API. I believe I showed a good / correct answer for that. As for 5000 pics, that's a new detail. I'm not a server expert. I can only suggest you pace it out to fly under the radar, cos I can imagine 5000 requests from same IP address **at once** will look suspicious & be IP blocked. An "all day, everyday" site-user could access 5000 images spread over a week & wont be blocked so y'know... pace it out. – VC.One Feb 23 '16 at 00:14
This is how to do it with Java & Jsoup library.
- get HTML page of the release
- parse HTML & get
<meta property="og:image" content=".." />
to getcontent
value
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class DiscogRelease {
private final String url;
public DiscogRelease(String url) {
this.url = url;
}
public String getImageUrl() {
try {
Document doc = Jsoup.connect(this.url).get();
Elements metas = doc.head().select("meta[property=\"og:image\"]");
if (!metas.isEmpty()) {
Element element = metas.get(0);
return element.attr("content");
}
} catch (IOException ex) {
Logger.getLogger(DiscogRelease.class.getName()).log(Level.SEVERE, null, ex);
}
return null;
}
}

- 2,321
- 26
- 33