-1

I need to extract certain data from a website.

I have watched this youtube video https://www.youtube.com/watch?v=rru3G7PLVjw and roughly have a sense of how to code it.

Basically what i want to do is to extract and store (the radio button text) Very easy!,Pretty easy and Not easy into a list

from the page source of https://docs.google.com/forms/d/1Mout_ImbF9N16EuCiYOxCrL6MbkUVkIEzijO1PAUQ68/viewform?key=pqbhTz7PIHum_4qKEdbUWVg

and then print out the element in the list

The following is the c# code which i have written based on the youtube video.

using System.Net;
using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace ExtractDataFromWebsite
{
    class Program
    {
        static void Main(string[] args)
        {
            List<string> radioOptions = new List<string>();
            WebClient web = new WebClient();

            // download html from certain website
            string html = web.DownloadString("https://docs.google.com/forms/d/1Mout_ImbF9N16EuCiYOxCrL6MbkUVkIEzijO1PAUQ68/viewform?key=pqbhTz7PIHum_4qKEdbUWVg");

            MatchCollection m1 = Regex.Matches(html, @"<input\stype=/"radio"\sname=/"entry.2362106 / "\svalue="(.+)\sid =/ "group_2362106_"
                , RegexOptions.Singleline);
            foreach (Match m in m1)
            {
                    string radioOption = m.Groups[1].Value;
                    radioOptions.Add(radioOption);
            }
            for (int i=0; i< radioOptions.Count;i++)
                Console.WriteLine(""+ radioOptions[i]);

            Console.ReadKey();
        }
    }
}

However the line MatchCollection m1 = Regex.Matches...... has some problem which i do not know how to fix.

Hope someone can provide me some hint or help to solve the above problem Thank you very much

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
xiaoxin
  • 7
  • 4
  • Suggest you read this [question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – Yurii N. Jun 30 '16 at 16:06

2 Answers2

0

Look into HtmlAgilityPack. You can load the source from your webclient response into a new htmldocument and traverse it pretty easily from there.

0

Try this regex as value extractor:

MatchCollection m1 = Regex.Matches(html, "<input type=\"radio\".+?value=\"(.+?)\".+?\">"
            , RegexOptions.Singleline);
Mavi Domates
  • 4,262
  • 2
  • 26
  • 45