0

So what I currently have is this long string that contains many substrings but I want to parse out all substrings that start with "http".

So say my string is the following:

"artist":"Idina Menzel","track":"Let It Go","file":"http://madeupwebsite.com" ...

And this repeats say 20 more times. Meaning I'll have 20 more websites that I want to parse out.

In the end, the goal is to have an arrayList containing all websites.

I have been looking over some websites are I believe the best way to do this is using regex but I am not too familiar with dynamic string parsing.

QQPrinti
  • 61
  • 1
  • 9

2 Answers2

-1

You can do something like below with regex:

 String line1 = "\"artist\":\"Idina Menzel\",\"track\":\"Let It Go\",\"file\":\"http://madeupwebsite.com\"";
 String line2 = "\"artist2\":\"Idina Menzel\",\"track\":\"Let It Go\",\"file\":\"http://madeupwebsite2.com\"";
 //You use any string as source

 Pattern pattern = Pattern.compile("http://([^\"]*)");
 Matcher matcher = pattern.matcher(line1+line2);
 while (matcher.find()) {
      System.out.println("list of sites: " + matcher.group(1));
 }

This will out:

list of sites: madeupwebsite.com
list of sites: madeupwebsite2.com

You can see: RegEx: Grabbing values between quotation marks for more details.

Community
  • 1
  • 1
Alireza Fattahi
  • 42,517
  • 14
  • 123
  • 173
-1

What you can probably do to parse your string later would be add a custom delimiter at the end of every sub-string as and when you type it as an input. Something like this:

Scanner scan = new Scanner(System.in);
int numOfLines = scan.nextInt(); // Give the number of substrings that you are going to give
String S = "";
while(numOfLines>0)
{
    S = scan.next()+"|"; // '|' is your custom delimiter (That symbol is LOGICAL OR symbol.
    numOfLines--;
}

This will make sure that a sub-string lies between two '|'s. Later you can use the split() function for splitting the entire string using the custom delimiter.

String[] listString = S.split("|");

This creates an array of sub-strings which were found in between the 2 '|'s. The total number of such sub-strings formed can be found using the .length function

int n = listString.length;

For checking whether the sub-string is an URL, you can download the Apache Commons Validator. Just download the latest version, add it to your java build path. Then create a UrlValidator to validate each individual string.

UrlValidator url = new UrlValidator();
ArrayList<String> al = new ArrayList<String>();
while(n>0)
{
    String temp = listString[n-1];
    if(url.isValid(temp))
    {
        al.add(temp);
    }
    n--;
}
for(String print : al) //For-Each loop
{
    System.out.println(print);
}

Hope this helps. :)

whiplash
  • 695
  • 5
  • 20