0

Currently, I am working on a project which is an application for Google Scholar analysis.

The basic specifications are ; parsing the title, author/s, publication venue... of the paper.

So, I can parse the HTML via C#, then store the variables in a SQL Server database.

Here is the problem, when I parse the author/s part it returns me something like that:

R Feynman, L Krauss.

or

R Feynman

or

R Feynman, L Krauss, C Sagan

I have read the answers about the Comma Separated Values on SQL Server. However, in my case, value for author/s column is not stable. What I want is; store author name/s in another table and resulting query should be like that:

R Feynman, L Krauss, NULL, NULL

R Feynman, NULL, NULL, NULL

R Feynman, L Krauss, C Sagan, NULL

If you have any suggestion please let me know.

Thank you!

Added after Edit:

I am taking the HTML partially from Google Scholar and show related field to user in some ListBox;

WebClient web = new WebClient();

string URL = URLTextBox.Text;

string page = web.DownloadString(URL);

string publicationName = "class=\"gsc_a_at\">(.*?)</a><div class=\"gs_gray\">(.*?)</div><div class=\"gs_gray\">(.*?)<span class=\"gs_oph\">, (.*?)</span></div></td><td class=\"gsc_a_c\"><a href=\"(.*?)\" class=\"gsc_a_ac\">(.*?)</a></td><td class=\"gsc_a_y\"><span class=\"gsc_a_h\">(.*?) </span></td></tr><tr class=\"gsc_a_tr\"><td class=\"gsc_a_t\"><a href=\"(.*?)\"";

foreach(Match match in Regex.Matches(page, publicationName))

{

listBox1.Items.Add(match.Groups[1].Value);

listBox2.Items.Add(match.Groups[2].Value);

listBox3.Items.Add(match.Groups[3].Value);

listBox4.Items.Add(match.Groups[4].Value);

listBox5.Items.Add(match.Groups[5].Value);

listBox6.Items.Add(match.Groups[6].Value);

}

Then I took the values from listbox to database.

using (SqlCommand command = new SqlCommand("insert into tbl_Titl values(@val) insert into tbl_Auth values(@val2) insert into tbl_Publ values(@val3) insert into tbl_Year values(@val4) insert into tbl_Link values(@val5) insert into tbl_Cita values(@val6)", connection))

{

                    for (int i = 0; i < listBox1.Items.Count; i++)

                    { 

                        command.Parameters.Clear();

                        command.Parameters.AddWithValue("@val", listBox1.Items[i]);

                        command.Parameters.AddWithValue("@val2", listBox2.Items[i]);

                        command.Parameters.AddWithValue("@val3", listBox3.Items[i]);

                        command.Parameters.AddWithValue("@val4", listBox4.Items[i]);

                        command.Parameters.AddWithValue("@val6", listBox6.Items[i]);

                        command.ExecuteNonQuery();

                   }

               }          

After these steps, joining the tables in database and in a query having all the result.

It is edited at the request of DJ KRAZE

Community
  • 1
  • 1
  • can you show how you are `Splitting / Parsing` the data.. are you familiar with the Split Function sounds like you can still split this based on `,` can you provide more information in regards to your code – MethodMan Nov 21 '14 at 17:45
  • I found http://stackoverflow.com/a/16083088/4279558 and apply it on my SQL code and it works. I am sorry for taking your time. Have a great day. –  Nov 21 '14 at 17:54
  • 1
    Why not parse the authors in the client (C#) and then insert the names into the table? That strikes me as being an easier approach. – Tim Nov 21 '14 at 17:56
  • Once again not showing us what you currently have opposed to sharing a Link that you have tried is like shooting in the dark because we cannot see what you actually have or have actually done edit your question and please provide more information than you have in your original post please – MethodMan Nov 21 '14 at 18:02
  • `class=\"gs_gray\">(.*?)...` I am using '(.*?)' for whole value between tags and immediately store the value after doing this. I will work on what you say, though. –  Nov 21 '14 at 18:04

0 Answers0