0

I have the following text in a word document:

This is a paragraph:

1) This is first bullet

2) This is second bullet

I am trying to get the text 1) and 2) but i am unsuccessful:

foreach (var items in para)
{
    int id = items.ParagraphProperties.NumberingProperties.NumberingId.Val;
    int refval = items.ParagraphProperties.NumberingProperties.NumberingLevelReference.Val;
    var runs = items .Descendants<Run>();
    foreach (var run in runs)
    {
        var txts = run.Descendants<Text>();

        foreach (var txt in txts)
        {

        }
    }
}

Accessing these values give me the following for both bullets:

claims.ParagraphProperties.NumberingProperties.NumberingId.Val
-> 2

claims.ParagraphProperties.NumberingProperties.NumberingLevelReference.Val
-> 0
Dirk Vollmar
  • 172,527
  • 53
  • 255
  • 316
Expert Novice
  • 1,943
  • 4
  • 22
  • 47

2 Answers2

2

I think I just got nerd-sniped by Dirk Vollmar, so now I had to try to implement a way of computing the "text" from an ordered list in word.

Now, this assumes the english versions of Word behave about the same way, as my danish version does, regardless, after testing a bit, I figured out there are 3 different levels of indention.

The first level is a number, second level is a letter and third level is a roman number. After that, the levels repeat so fourth level is a number and so forth.

This means, that to calculate what text is supposed to be where in the list, we just need to know the position of the paragraph, at the indention-level.

Here is my solution. I'm using this document for testing it: test word document

After that, I've written an extension-method for a Paragraph. There isn't really any error handling, it assumes you are passing a Paragraph that actually is in a list.

public static string GetIndentionTextFromParagraph(this Paragraph paragraph)
{
    int numberingId = paragraph.ParagraphProperties.NumberingProperties.NumberingId.Val; 
    int numberingLevel = paragraph.ParagraphProperties.NumberingProperties.NumberingLevelReference.Val;
    //isolate paragraphs with the correct numbering id and indention level
    var paragraphsInList = paragraph.Parent.Descendants<Paragraph>().Where(p =>
        p.ParagraphProperties != null &&
        p.ParagraphProperties.NumberingProperties != null &&
        p.ParagraphProperties.NumberingProperties.NumberingId.Val == numberingId &&
        p.ParagraphProperties.NumberingProperties.NumberingLevelReference.Val == numberingLevel
        ).ToList();
    //find position of paragraph in list
    int paragraphPositionInLevelOfList = paragraphsInList.IndexOf(paragraph);
    //boil the level down to always being between 0 and 2 so we can chose what kind of response we want to give
    while (numberingLevel > 2)
    {
        numberingLevel = numberingLevel - 3;
    }

    if (numberingLevel == 0)
    {
        //return a number
        return (paragraphPositionInLevelOfList + 1).ToString();
    }
    else if (numberingLevel == 1)
    {
        //return a letter
        return "abcdefghijklmnopqrstuvwxyz"[paragraphPositionInLevelOfList].ToString();
    }
    else if (numberingLevel == 2)
    {
        //return roman
        return ToRoman(paragraphPositionInLevelOfList + 1);
    }
    else return "unknown list configuration";
}

Now there is only the matter of testing if it works. How you want to isolate your paragraphs is up to you. For testing it, I just isolate them with some unique text.

using (var wordDoc = WordprocessingDocument.Open(@"C:\test\qtest\test.docx", true))
{
    MainDocumentPart mainPart = wordDoc.MainDocumentPart;
    var document = mainPart.Document;

    Paragraph firstIndention = document.Descendants<Paragraph>().Where(i => i.InnerText.Contains("my number bullet 1")).First();
    Paragraph secondIndention = document.Descendants<Paragraph>().Where(i => i.InnerText.Contains("letter bullet 2")).First();
    Paragraph thirdIndention = document.Descendants<Paragraph>().Where(i => i.InnerText.Contains("third indention 2")).First();
    Paragraph fourthIndention = document.Descendants<Paragraph>().Where(i => i.InnerText.Contains("And we are back to numbering, so we know the rules now")).First();

    Console.WriteLine(firstIndention.GetIndentionTextFromParagraph());
    Console.WriteLine(secondIndention.GetIndentionTextFromParagraph());
    Console.WriteLine(thirdIndention.GetIndentionTextFromParagraph());
    Console.WriteLine(fourthIndention.GetIndentionTextFromParagraph());
}

This will output: 1, b, II and 1.

Hope this helps.

I copied the "ToRoman" function from Converting integers to roman numerals

static string ToRoman(int number)
{
    if ((number < 0) || (number > 3999)) throw new ArgumentOutOfRangeException("insert value betwheen 1 and 3999");
    if (number < 1) return string.Empty;
    if (number >= 1000) return "M" + ToRoman(number - 1000);
    if (number >= 900) return "CM" + ToRoman(number - 900); 
    if (number >= 500) return "D" + ToRoman(number - 500);
    if (number >= 400) return "CD" + ToRoman(number - 400);
    if (number >= 100) return "C" + ToRoman(number - 100);
    if (number >= 90) return "XC" + ToRoman(number - 90);
    if (number >= 50) return "L" + ToRoman(number - 50);
    if (number >= 40) return "XL" + ToRoman(number - 40);
    if (number >= 10) return "X" + ToRoman(number - 10);
    if (number >= 9) return "IX" + ToRoman(number - 9);
    if (number >= 5) return "V" + ToRoman(number - 5);
    if (number >= 4) return "IV" + ToRoman(number - 4);
    if (number >= 1) return "I" + ToRoman(number - 1);
    throw new ArgumentOutOfRangeException("something bad happened");
}
Community
  • 1
  • 1
Kaspar Kjeldsen
  • 936
  • 1
  • 13
  • 30
1

From your tags I assume that you are trying to get the list item text using the Open XML SDK (and not using Word interop).

If you unzip the document package and have a look at the document.xml, you will see that the list item text is not stored in the document. It is a value computed by the application when the document is opened. So unfortunately, there is no easy way to get the value using the Open XML SDK.

If you want to know the list item text, you basically have two options:

  1. Use Word interop which will give you the value computed by Word (e.g. using Range.ListFormat.ListString)
  2. Compute the value yourself based on Open XML. This will be a bit of work, but you find the algorithm documented in an MSDN article: Algorithm to Assemble List Item Text.
Dirk Vollmar
  • 172,527
  • 53
  • 255
  • 316