2

My question may sound weird but that's the scenario i'm at. I need to parse an Excel formula (getting it from Office.Interop.Excel) and get the parameters from the formula (# of parameter can vary) There are multiple cases to consider. E.g.:

1. myformula("param1", "param2")
2. myformula("param1", A2)
3. myformula("param1", , "param3")
4. myformula(A1, , "param3")
5. myformula("param1, has commas in it", "param2")

Is there a nicer way to parse this as opposed to having multiple if branches (especially considering when string array is mixed, i.e. some parameters have quotations and other parameters are reference parameters)?

Using string split(',') doesn't seem to be too helpful as I can have commas in the parameter itself.

I've also tried

string[] paramArray = new string[]{(parameters)}

where parameters = "\"param1\", \"param2\"";

but that didn't seem to work either (it won't work at all if I have an empty parameter but that's another case).

My outcome should be some sort of an array (or list or any other collection) that would contain all of the parameters, and if a parameter didn't have quotation marks in it then I would need to evaluate it.

Any help would be appreciated.

Nir
  • 21
  • 2
  • Split != parse, unless the text you want to "parse" is quite constrained... – jtolle Feb 11 '11 at 03:29
  • So what do you want to do with more complex formulas like this ? =FooBar($zz$4 & INDIRECT(fred) & UPPER("abcd , "),"parm2") – Charles Williams Feb 11 '11 at 15:33
  • @jtolle By Split I did mean parse as I'm splitting the string and thus parsing the parameter list – Nir Feb 11 '11 at 16:12
  • @Charles Williams - This is still fine although a lot more complicated, but as long as I can get the parameters one by one (whatever other formulas each might have) I could process them accordingly – Nir Feb 11 '11 at 16:15
  • You may want to check out this starting point on parsing Excel formulae: http://ewbi.blogs.com/develops/2004/12/excel_formula_p.html – Charles Williams Feb 11 '11 at 19:10
  • Excellent! This article is exactly what I needed. Thanks a lot! – Nir Feb 11 '11 at 20:10
  • @Nir, I was being a little glib...sorry. The longer version is that String.Split isn't very good for parsing anything remotely complex. Excel formulas in general are too complex for mere string-splitting. Charles pointed you to a good source. You might also see the comments on this question: http://stackoverflow.com/questions/1897906/is-this-the-regex-for-matching-any-cell-reference-in-an-excel-formula . Now, if you have enough constraints on the text you want to parse, maybe something simple will work just fine for you... – jtolle Feb 11 '11 at 21:38

1 Answers1

0

I would process the string looking for " or , if " found search fwd looking for the closing " ignoring any , along the way and taking into account the possibility of impeded "'s in the parameter. When you get the closing " resume looking for " or , and repeat.

Another option would be to use split but then check the results for elements beginning with " . Merge these with the next array element to for the complete parameter (may be that > 2 consecutive array element need to be merged) Also watch out for strings like "zzz"",""xxx" (a single string that = zzz","xxx)

chris neilsen
  • 52,446
  • 10
  • 84
  • 123
  • Your first suggestion is kinda what I was thinking, but the second one sounds interseting. I'll give it a try. Thanks! – Nir Feb 11 '11 at 16:17