0

sorry if it's kind of a big question but I'm just looking for someone to tell me in what direction to learn more since I have no clue, I have very basic knowledge of HTML and Java.

Someone in my family has to copy every product from a supplier into his own webshop. The problem is he needs to put in all the articles one by one by hand,I'm looking for a way to replace him by a program.

I already got a bit going on for the price calculation , all I need now is the info of the product.

http://pastebin.com/WVCy55Dj

From line 1009 to around 1030. I need 3 seperate strings of the three span's with the class "CatalogusListDetailTest" From line 987 to around 1000. I need a way to get all these images, it's on the website at www.flamingo.be/Images/Products/Large/"productID"(our first string).jpg sometimes there's a _A , _B as you can see in this example so I'm looking for a way to make it check if there is and get these images aswell.

If I could get this far then I'd be very thankful ! I'll figure the rest out myself, sorry for the long post, wanted to give as much info as possible.

Boyen
  • 1,429
  • 2
  • 15
  • 22
  • 2
    What have you tried? Are you sure the catalog is not available is a more parseable format, like XML or JSON? – JB Nizet Mar 02 '13 at 12:26
  • You could use an [html parser](http://stackoverflow.com/questions/3152138/what-are-the-pros-and-cons-of-the-leading-java-html-parsers) and access the DOM elements by their class name, I think. – G. Bach Mar 02 '13 at 12:27
  • 1
    If you need javascript support (because the web site might use ajax), have a look at HtmlUnit – MrSmith42 Mar 02 '13 at 12:31
  • @JBNizet http://pastebin.com/i2hvDLFU This is what I have after trying to use Jsoup , now this works almost perfectly fine and is exactly what I need except one problem , if you look at the webpage of the catalogus it also has a product code, but that's not in a span with a class, any idea how I can get it? – Boyen Mar 02 '13 at 14:23

2 Answers2

0

You can look at HTML parser lib Jsoup, doc reference: http://jsoup.org/cookbook/

EDIT: Code to get the product code:

    Elements classElements = document.getElementsByClass("CatalogusListDetailTextTitel");
            for (Element classElement : classElements) {
                if (classElement.text().contains("Productcode :")) {
                System.out.println(classElement.parent().ownText());
                }
            }

Instead of document you may have to use an element to get the consistent result, above code will print all the product codes.

vikasing
  • 11,562
  • 3
  • 25
  • 25
  • pastebin.com/i2hvDLFU This is what I have after trying to use Jsoup , now this works almost perfectly fine and is exactly what I need except one problem , if you look at the webpage of the catalogus it also has a product code, but that's not in a span with a class, any idea how I can get it? – Boyen just now edit – Boyen Mar 02 '13 at 14:23
  • @Boyen if this answer helped you need to select the answer as accepted by clicking the ✔ symbol :) – vikasing Mar 02 '13 at 15:00
  • Sorry if I'm asking for too much , but is there anyway I can use jsoup to automatically save images from a url ? the url would be something like String url2 = "http://www.flamingo.be/Images/Products/Large/" + productcode + ".jpg" How can I get the jpg and save it to for example C:\Users\Boyen\Pictures\aaa + yeah sorry I'm new here, pressed the symbol now :) – Boyen Mar 02 '13 at 15:00
  • not that I know of, Jsoup can help you getting the image links by checking for img tag using Element.getElementsByTag("img") after that you can download the images, check out http://stackoverflow.com/questions/5882005/how-to-download-image-from-any-web-page-in-java – vikasing Mar 02 '13 at 15:04
  • @Boyen check my answer below for the download image – araknoid Mar 02 '13 at 15:12
  • @Vikasing I've been using your help alot for many different things the last few days and I have to say I really appreciate it ! I have one more question though if I may,is it possible to get text from a div that is in another tab made by javascript? I'm very sorry that I'm too afraid to give out the code of the website since I'm not sure if I'm allowed too, since the erotic content of the page, It contains for example :
    INFORMATIE
    Is there any way I can get the text that comes when I click on it?
    – Boyen Mar 05 '13 at 22:11
  • to get a better answer you can ask a new question for this issue, AFAIK jsoup may not be able to help in capturing the javascript event. – vikasing Mar 06 '13 at 13:55
0

You can use JTidy for what you need.

Code Example:

public void downloadSinglePage(String pageLink, String targetDir) throws XPathExpressionException, IOException {
                URL url = new URL(pageLink);
                BufferedInputStream page = new BufferedInputStream(url.openStream());

                Tidy tidy = new Tidy();
                tidy.setQuiet(true);
                tidy.setShowWarnings(false);
                Document response = tidy.parseDOM(page, null);

                XPathFactory factory = XPathFactory.newInstance();
                XPath xPath=factory.newXPath();
                NodeList nodes = (NodeList)xPath.evaluate(IMAGE_PATTERN, response, XPathConstants.NODESET);

                String imageURL = (String) nodes.item(0).getNodeValue();
                saveImageNIO(imageURL, targetDir);

        }

where

IMAGE_PATTERN = "///a/img/@src";

but the pattern depends on how the image is innested in the page HTML code.

Method for saving Image using NIO:

public void saveImageNIO(String imageURL, String targetDir, String imageName) throws IOException {
            URL url = new URL(imageURL);
            ReadableByteChannel rbc = Channels.newChannel(url.openStream());
            FileOutputStream fos = new FileOutputStream(targetDir + "/" + imageName + ".jpg");
            fos.getChannel().transferFrom(rbc, 0, 1 << 24);
        }
araknoid
  • 3,065
  • 5
  • 33
  • 35
  • I used only your method SaveImageNIO and it works great for what I need, thanks ! One last problem : the saved image can't be used untill I close my java app which I don't want to do, any way to prevent this? – Boyen Mar 02 '13 at 15:46
  • @Boyen do you need the image for any particular operation? – araknoid Mar 02 '13 at 17:27
  • Yes , I want to use the image in a different webpage that looks like this http://pastebin.com/MPatCqk9 CTRL+F "foto 2" , I want it to get in the browse function there, any idea how? – Boyen Mar 02 '13 at 19:04