2

Is there any way to extract underlying metadata of various excel formulas. I am building this framework in C# that uses excel formula, and interested in knowing each formula input arguments, their data types and return type. This will help me to build a wizard screen based on this supplied meta-data.

Thanks in advance for your help.

Community
  • 1
  • 1
GammaVega
  • 779
  • 1
  • 7
  • 23
  • 1
    Sorry, I don't think is it possible to do it. You could try to parse the online documentation, or the local help files, but it ain't going to be easy. – John Alexiou May 01 '12 at 20:58
  • I found the `url` with the Worksheet functions: http://office.microsoft.com/client/helppreview.aspx?AssetId=HV805551279990&lcid=1033&NS=EXCEL%2EDEV&Version=12&pid=CH080555125&CTT=4 – John Alexiou May 01 '12 at 21:04
  • + 1 to both :) Good question and good suggestion. `but it ain't going to be easy` @ja72: It is very easy :) – Siddharth Rout May 02 '12 at 09:33

2 Answers2

2

Building up on what @ja72 suggested, it is very easy to parse data from the link which he mentioned. I am not too good with C# so here is a vb.net code which you can convert to C#

Having said that, there are many ways you can look at this problem from C#

WAY 1

Navigate to the URL at runtime and parse the values using the below code.

Disadvantage:

1) You have to have an internet connection

2) This process is slow

WAY 2

Create a separate program to navigate to the URL and parse the values using the below code. Use the below code to generate the output to a text file or a csv file and embed the file in your resource so that you can work with it anytime you want

I would suggest WAY 2 but the decision is finally yours. :)

CODE

Public Class Form1
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        TextBox1.Clear()
        GetFormulas()
        MessageBox.Show ("Done!")
    End Sub

    Sub GetFormulas()
        Dim wc As New Net.WebClient
        '~~> first get links
        Dim mainPage As String = wc.DownloadString("http://office.microsoft.com/client/helppreview.aspx?AssetId=HV805551279990&lcid=1033&NS=EXCEL.DEV&Version=12&pid=CH080555125")
        Dim doc As mshtml.IHTMLDocument2 = New mshtml.HTMLDocument
        doc.write (mainPage)
        doc.close()
        Dim table As mshtml.IHTMLElement = DirectCast(doc.getElementById("vstable"), mshtml.IHTMLElement2).getElementsByTagName("table")(0)
        Dim links As mshtml.IHTMLElementCollection = table.getElementsByTagName("A")
        For Each link In links
            Dim childPage As String = wc.DownloadString(link.getAttribute("href"))
            Dim doc2 As mshtml.IHTMLDocument2 = New mshtml.HTMLDocument
            doc2.write (childPage)
            doc2.close()
            Dim div2 As mshtml.IHTMLElement2 = doc2.getElementById("m_article")
            For Each elem As mshtml.IHTMLElement In div2.getElementsByTagName("P")
                If elem.getAttribute("className") = "signature" Then
                    Dim formulaString As String = elem.innerText
                    TextBox1.AppendText (link.innerText & vbTab & formulaString & vbCrLf)
                End If
            Next
        Next
    End Sub
End Class

SNAPSHOT

enter image description here

NOTE: The above is just an example on how to scrape the above link given by ja72. In case you plan to go with any other link then you will have to change the code accordingly. Also note that the formulas mentioned in the above link are for Excel 2007 onwards. For Excel 2003 you will have to go in for another link. I have not included a STOP button in the above example so once the program starts running, it cannot be stopped till it finishes. I am sure that you can add one more button to terminate the extraction.

Siddharth Rout
  • 147,039
  • 17
  • 206
  • 250
  • + 1 Nicely Done. Reminds me of this thread http://stackoverflow.com/questions/8798260/html-parsing-of-cricinfo-scorecards/8846791#8846791 You really love scrapping lol – Pradeep Kumar May 02 '12 at 09:56
  • @PradeepKumar: LOL No :) It is just that I found this question very interesting... – Siddharth Rout May 02 '12 at 09:57
  • +1 Thanks for filling in that gaps on my suggestion. Parsing html is not my bread and butter and I am glad the community rose up to the challenge and provided the answer I couldn't. – John Alexiou May 02 '12 at 12:30
1

All credit to @SiddharthRout. Here is the C# conversion of the code that Sid posted

You really have to mess with a lots of casts and converts when using C#. But then that's the way C# works :P

using System;
using System.Windows.Forms;

Namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        Public Form1()
        {
            InitializeComponent();
        }

        private void button1_Click(object sender, EventArgs e)
        {
            textBox1.Clear();
            GetFormulas();
            MessageBox.Show("Done!");
        }

        public void GetFormulas()
        {
            mshtml.HTMLDocument doc = NewHtmlDoc("http://office.microsoft.com/client/helppreview.aspx?AssetId=HV805551279990&lcid=1033&NS=EXCEL.DEV&Version=12&pid=CH080555125");
            mshtml.IHTMLElement2 table = (mshtml.IHTMLElement2)(mshtml.IHTMLElement2)((mshtml.IHTMLElement2)doc.getElementById("vstable")).getElementsByTagName("table").item(null, 0);
            mshtml.IHTMLElementCollection links = table.getElementsByTagName("A");
            foreach (mshtml.IHTMLElement link in links)
            {
                mshtml.HTMLDocument doc2 = NewHtmlDoc(link.getAttribute("href",0).ToString());
                mshtml.IHTMLElement div2 = doc2.getElementById("m_article");
                foreach (mshtml.IHTMLElement elem in ((mshtml.IHTMLElement2)div2).getElementsByTagName("P"))
                {
                    if (elem.getAttribute("className",0) != null && elem.getAttribute("className",0).ToString() == "signature")
                    {
                        string formulaString = elem.innerText;
                        textBox1.AppendText(link.innerText + "\t\t" + formulaString + "\n");
                    }
                }
            }
        }

        private mshtml.HTMLDocument NewHtmlDoc(string url)
        {
            System.Net.WebClient wc = new System.Net.WebClient();
            string page = wc.DownloadString(url);
            mshtml.IHTMLDocument2 doc = (mshtml.IHTMLDocument2)(new mshtml.HTMLDocument());
            doc.write(page);
            doc.close();
            return (mshtml.HTMLDocument)doc;
        }

    }
}
Pradeep Kumar
  • 6,836
  • 4
  • 21
  • 47
  • 1
    + 1 :) Guess I need to experiment more with C# But on 2nd thoughts... Nah! :) – Siddharth Rout May 02 '12 at 11:04
  • 1
    BTW, I added a Stop button ;) You might want to include that? ;) http://siddharthrout.wordpress.com/2012/05/02/vb-net-to-retrieve-the-names-and-arguments-of-all-excel-formulas/ – Siddharth Rout May 02 '12 at 12:15