0

This is the first time I try to parse XML so be gentle with me :).

So i downloaded the source code of a web page i want to parse certain information from http://www.songlyrics.com/eminem/my-name-is-lyrics/.

Now i copy pasted the XML file to the notepad. I saved the file as XML - 1.

My code looks like this:

public class Program 
{
    public static void main(String[] args)
    {
        System.out.println("Program starts:");

        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        try
        {
            DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
            Document document = documentBuilder.parse(new File("C:/Users/volca_000/Desktop/XML - 1.txt"));
            NodeList paragraphsNodeList =  document.getElementsByTagName("p");

            for (int i = 0;i < paragraphsNodeList.getLength();i++)
            {
                Node paragraphNode = paragraphsNodeList.item(i);
                if (paragraphNode.getNodeType() == Node.TEXT_NODE)
                {
                    Element element = (Element)paragraphNode;
                    String node = element.getTextContent();
                    System.out.println(node);
                }
            }
        } 
        catch (ParserConfigurationException e)
        {
            e.printStackTrace();
        }
        catch (SAXException e)
        {
            e.printStackTrace();
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
    } // End of main
} // End of Program class

Even if i discard the if statement get nothing on the console.

if (paragraphNode.getNodeType() == Node.TEXT_NODE)

What am i doing wrong? Any suggestions would be very appreciated.

God
  • 1,238
  • 2
  • 18
  • 45
  • A quick look at the source code of that page (assuming you can parse it as XML) suggests that the paragraph elements don't have text content, but rather other elements such as hyperlinks etc. Does the above code successfully find a sequence of paragraph elements ? – Brian Agnew Feb 29 '16 at 15:03
  • @BrianAgnew Actually i checked to see if `NodeList paragraphsNodeList = document.getElementsByTagName("p");` paragraphsNodeList is not null and either case it prints nothing. Some weird thing. And some execution of the program catch Exceptions and some does not. – God Feb 29 '16 at 15:05
  • @BrianAgnew And BTW , why you assume i can't parse it as `XML`? If not , So what is the method for doing what I'm trying to do? (Get the lyrics on my console screen). – God Feb 29 '16 at 15:08
  • @sgpalit I will check that out. – God Feb 29 '16 at 15:27

0 Answers0