-2

I have the following huge text: http://freetexthost.com/15nbm0dhob And I need to get all images URLS from the standard_resolution.

"standard_resolution": {
"url": "http://distilleryimage3.s3.amazonaws.com/59d6984092a211e392db12e25f465f4f_8.jpg",
"width": 640,
"height": 640
}

For example: from this, I would like to get the: http://distilleryimage3.s3.amazonaws.com/59d6984092a211e392db12e25f465f4f_8.jpg

And afterall I would like to have a List of string with all the standard URL's. I'm making a C# App.

3 Answers3

0

I think you can use this pattern: ^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$

Here is an example:

var pattern = @"^http\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/\S*)?$";
var result= File.ReadAllText("filepath")
            .Split(new[] {'"'}, StringSplitOptions.RemoveEmptyEntries)
            .Where(line => Regex.IsMatch(line, pattern))
            .ToList();

I have tested, result contains 25 url for your input.

Selman Genç
  • 100,147
  • 13
  • 119
  • 184
0

Try:

List<String> urls = new List<String>();                                                            
string txt = "standard_resolution...."; // Your main text                                                                   
while(txt.Contains("url"))                                                                          
{                                                                                                   
    txt = txt.Substring(txt.IndexOf("url\": \""));                                                  
    string geturl = txt.Substring(txt.IndexOf("url")+7, txt.IndexOf(".jpg") - txt.IndexOf("url")-3);
    urls.Add(geturl);                                                                              
    txt = txt.Substring(txt.IndexOf(".jpg"));                                                       
}                                                                                                   
mnshahab
  • 770
  • 7
  • 16
0

Selman22: Your answer will get all the URLs, whereas he only wants the standard_resolution URLs only.

Here's a quick and dirty regex I put together. You may want to tweak it a little to cover all potential corner cases that I haven't thought of yet with regard to the structure of the JSON, incase it comes back slightly different to the source you posted.

const string input = @"
  ""standard_resolution"": {
  ""url"": ""http://distilleryimage3.s3.amazonaws.com/59d6984092a211e392db12e25f465f4f_8.jpg"",
  ""width"": 640,
  ""height"": 640
  }";

var pattern = @"\""standard_resolution\"".*?\""url\""\:\s\""(?<url>.*?)\""";

var urls = Regex.Matches(input.Replace("\r\n", string.Empty), pattern)
    .Cast<Match>()
    .Select(each => each.Groups["url"].Value);

var count = urls.Count();

Another alternative outside the direct scope of your question is to use a JSON parser: Parsing JSON using Json.net

Community
  • 1
  • 1
Jon Barker
  • 1,788
  • 15
  • 25
  • 1
    It only works if its a const string, but i'm getting de JSON using this: var json = cliente.DownloadString("https://api.instagram.com/v1/tags/thenight2/media/recent?access_token=**TOKEN**"); – user3295041 Feb 11 '14 at 00:13
  • Are you removing the new lines in the input, as my code does? input.Replace("\r\n", string.Empty) You may also need to try just removing \r or just \n depending on the format coming from the webserver. – Jon Barker Feb 11 '14 at 00:21
  • Oh, i got it. Whats is the pattern for the code like this: "standard_resolution":{"url":"http:\/\/distilleryimage9.s3.amazonaws.com\/382b566491f211e3ae050a2150c32a45_8.jpg","width":640,"height":640}},"users_in_photo":[],"caption":{"created_time":"1391995536","text":"#thenight2","from":{"username":"thenight2","profile_picture":"http:\/\/images.ak.instagram.com\/profiles\/anonymousUser.jpg","id":"1082107741","full_name":"The Night Party 2"},"id":"652428307931519401"},"user_has_liked":false,"id":"652428307235264996_1082107741","user":{"username":"thenight2" – user3295041 Feb 11 '14 at 03:26
  • The code comes in one line format. – user3295041 Feb 11 '14 at 03:29