0

Okay, so I am writing a Java program which requires me to search the web and display data. As every smart individual would do, the best place to search for information is Wikipedia.

I did some looking around and found MediaWiki, but I have no idea where to start. I'll explain what I need, and all help is appreciated!

Example: User input: Who is Ed Sheeran?
(leave the extracting part to me, I know how to do that)

In the background, the program searches Wikipedia pages for Ed Sheeran, and extracts the first few sentences about him. And then, it extracts information and print it back.

So, after the program is made, this will be my output:

User input: Who is Ed Sheeran?
Output: Edward Christopher "Ed" Sheeran (born 17 February 1991) is an English singer-songwriter and occasional actor.

User input: Where is Bangalore?
Output: Bangalore /bæŋɡəˈlɔːr/, officially known as Bengaluru ([ˈbeŋɡəɭuːɾu]), is the capital of the Indian state of Karnataka.

All help will be appreciated. Thanks!

Aekansh Dixit
  • 513
  • 1
  • 9
  • 20
  • 2
    Have you checked wikimedia's web [API](https://www.mediawiki.org/wiki/API:Main_page)? Probably get data from their API, then analyze text from the results of your request. – mhfff32 Oct 08 '15 at 16:04
  • Okay, I am a beginner in this and have absolutely NO CLUE what so ever where to start. Can someone write an answer for a person who wants to start from scratch? – Aekansh Dixit Oct 08 '15 at 16:05
  • This is way too broad. This cannot be reasonably answered in this format. Please break this down in to discrete chunks -- designing UI, analyzing/cleaning user input, making queries to Wikipedia, consuming and formatting received data, outputting data, etc. and start on each bit. Come back when you have specific question. It seems that all you've done so far is define requirements and then just give them to us. – tnw Oct 08 '15 at 16:20
  • That's right. I have done the designing UI, analyzing/cleaning user input part. I store the search term in a string. Now the part where I need help is on making queries to Wikipedia , consuming and formatting received data. Once, I have gotten the correct page, I want to display the first sentence of the Summary text. – Aekansh Dixit Oct 08 '15 at 16:43
  • http://stackoverflow.com/questions/7185288/how-to-get-wikipedia-content-using-wikipedias-api – tnw Oct 08 '15 at 16:49
  • Unfortunately requests like "teach me how to make a query and consume the response" are off-topic/too broad for StackOverflow. – tnw Oct 08 '15 at 16:50

1 Answers1

2

This worked for me with that query:

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;

. . .

String subject = "Ed Sheeran";
URL url = new URL("https://en.wikipedia.org/w/index.php?action=raw&title=" + subject.replace(" ", "_"));
String text = "";
try (BufferedReader br = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
    String line = null;
    while (null != (line = br.readLine())) {
        line = line.trim();
        if (!line.startsWith("|")
                && !line.startsWith("{")
                && !line.startsWith("}")
                && !line.startsWith("<center>")
                && !line.startsWith("---")) {
            text += line;
        }
        if (text.length() > 200) {
            break;
        }
    }
}
System.out.println("text = " + text);

prints:

text = '''Edward Christopher''' "'''Ed'''" '''Sheeran''' (born 17 February 1991) is an English singer-songwriter and occasional actor. Born in [[Hebden Bridge]], West Yorkshire and raised in [[Framlingham]],

For other queries you will probably need some trial and error to clean out extra stuff from their content.

Update

Here is an alternative that parses JSON with the library here:
http://search.maven.org/#artifactdetails|org.json|json|20150729|jar

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URL;
import org.json.JSONObject;

...

String subject = "Ed Sheeran";
URL url = new URL("https://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exsentences=1&exintro=&explaintext=&exsectionformat=plain&titles=" + subject.replace(" ", "%20"));
String text = "";
try (BufferedReader br = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()))) {
    String line = null;
    while (null != (line = br.readLine())) {
        line = line.trim();
        if (true) {
            text += line;
        }
    }
}

System.out.println("text = " + text);
JSONObject json = new JSONObject(text);
JSONObject query = json.getJSONObject("query");
JSONObject pages = query.getJSONObject("pages");
for(String key: pages.keySet()) {
    System.out.println("key = " + key);
    JSONObject page = pages.getJSONObject(key);
    String extract = page.getString("extract");
    System.out.println("extract = " + extract);
}

Output:

extract = Edward Christopher "Ed" Sheeran (born 17 February 1991) is an English singer-songwriter and occasional actor.

WillShackleford
  • 6,918
  • 2
  • 17
  • 33