Bullet Lists in python-docx

Question

I am trying to get this to work in python-docx:

A bullet list I can get using this:

from docx import Document
doc = Document()
p = doc.add_paragraph()
p.style = 'List Bullet'

r = p.add_run()
r.add_text("Item 1")
# Something's gotta come here to get the Sub-Item 1
r = p.add_run()
r.add_text("Item 2")    
# Something's gotta come here to get the Sub-Item 2

I figure, adding another paragraph in the middle won't help because that essentially would mean I am making another List Bullet with the same formatting as its parent and not the child-like formatting I want. Also, adding another run to the same paragraph doesn't help either(I tried this, messes up the whole thing..). Any way to do it?

Mad Physicist · Accepted Answer · 2022-06-29T22:16:32.643

There is a way to do it, but it involves a bit of extra work on your part. There is currently no "native" interface in python-docx for doing this. Each bulleted item must be an individual paragraph. Runs apply only to the text characters.

The idea is that list bulleting or numbering is controlled by a concrete bullet or number style, which refers to an abstract style. The abstract style determines the styling of the afflicted paragraph, while the concrete numbering determines the number/bullet within the abstract sequence. This means that you can have paragraphs without bullets and numbering interspersed among the bulleted paragraphs. At the same time, you can restart the numbering/bulleting sequence at any point by creating a new concrete style.

All this information is hashed out (in detail but unsuccessfully) in Issue #25. I don't have the time or resources to lay this to rest right now, but I did write a function that I left in a comment in the discussion thread. This function will look up an abstract style based on the level of indentation and paragraph style you want. It will then create or retrieve a concrete style based on that abstract style and assign it to your paragraph object:

def list_number(doc, par, prev=None, level=None, num=True):
    """
    Makes a paragraph into a list item with a specific level and
    optional restart.

    An attempt will be made to retreive an abstract numbering style that
    corresponds to the style of the paragraph. If that is not possible,
    the default numbering or bullet style will be used based on the
    ``num`` parameter.

    Parameters
    ----------
    doc : docx.document.Document
        The document to add the list into.
    par : docx.paragraph.Paragraph
        The paragraph to turn into a list item.
    prev : docx.paragraph.Paragraph or None
        The previous paragraph in the list. If specified, the numbering
        and styles will be taken as a continuation of this paragraph.
        If omitted, a new numbering scheme will be started.
    level : int or None
        The level of the paragraph within the outline. If ``prev`` is
        set, defaults to the same level as in ``prev``. Otherwise,
        defaults to zero.
    num : bool
        If ``prev`` is :py:obj:`None` and the style of the paragraph
        does not correspond to an existing numbering style, this will
        determine wether or not the list will be numbered or bulleted.
        The result is not guaranteed, but is fairly safe for most Word
        templates.
    """
    xpath_options = {
        True: {'single': 'count(w:lvl)=1 and ', 'level': 0},
        False: {'single': '', 'level': level},
    }

    def style_xpath(prefer_single=True):
        """
        The style comes from the outer-scope variable ``par.style.name``.
        """
        style = par.style.style_id
        return (
            'w:abstractNum['
                '{single}w:lvl[@w:ilvl="{level}"]/w:pStyle[@w:val="{style}"]'
            ']/@w:abstractNumId'
        ).format(style=style, **xpath_options[prefer_single])

    def type_xpath(prefer_single=True):
        """
        The type is from the outer-scope variable ``num``.
        """
        type = 'decimal' if num else 'bullet'
        return (
            'w:abstractNum['
                '{single}w:lvl[@w:ilvl="{level}"]/w:numFmt[@w:val="{type}"]'
            ']/@w:abstractNumId'
        ).format(type=type, **xpath_options[prefer_single])

    def get_abstract_id():
        """
        Select as follows:

            1. Match single-level by style (get min ID)
            2. Match exact style and level (get min ID)
            3. Match single-level decimal/bullet types (get min ID)
            4. Match decimal/bullet in requested level (get min ID)
            3. 0
        """
        for fn in (style_xpath, type_xpath):
            for prefer_single in (True, False):
                xpath = fn(prefer_single)
                ids = numbering.xpath(xpath)
                if ids:
                    return min(int(x) for x in ids)
        return 0

    if (prev is None or
            prev._p.pPr is None or
            prev._p.pPr.numPr is None or
            prev._p.pPr.numPr.numId is None):
        if level is None:
            level = 0
        numbering = doc.part.numbering_part.numbering_definitions._numbering
        # Compute the abstract ID first by style, then by num
        anum = get_abstract_id()
        # Set the concrete numbering based on the abstract numbering ID
        num = numbering.add_num(anum)
        # Make sure to override the abstract continuation property
        num.add_lvlOverride(ilvl=level).add_startOverride(1)
        # Extract the newly-allocated concrete numbering ID
        num = num.numId
    else:
        if level is None:
            level = prev._p.pPr.numPr.ilvl.val
        # Get the previous concrete numbering ID
        num = prev._p.pPr.numPr.numId.val
    par._p.get_or_add_pPr().get_or_add_numPr().get_or_add_numId().val = num
    par._p.get_or_add_pPr().get_or_add_numPr().get_or_add_ilvl().val = level

Using the styles in the default built-in document stub, you can do something like this:

d = docx.Document()
p0 = d.add_paragraph('Item 1', style='List Bullet')
list_number(d, p0, level=0, num=False)
p1 = d.add_paragraph('Item A', style='List Bullet 2')
list_number(d, p1, p0, level=1)
p2 = d.add_paragraph('Item 2', style='List Bullet')
list_number(d, p2, p1, level=0)
p3 = d.add_paragraph('Item B', style='List Bullet 2')
list_number(d, p3, p2, level=1)

The style will not only affect the tab stops and other display characteristics of the paragraph, but will also help look up the appropriate abstract numbering scheme. When you implicitly set prev=None in the call for p0, the function creates a new concrete numbering scheme. All the remaining paragraphs will inherit the same scheme because they get a prev parameter. The calls to list_number don't have to be interleaved with the calls to add_paragraph like that, as long as the numbering for the paragraph used as prev is set before the call.

You can find an implementation of this function in a library I maintain, called haggis, available on GitHub and PyPi: haggis.files.docx.list_number.

+1 Thanks @Mad Physicist for your time and effort to draft that answer.. I am gonna need some time to go thru it..Will get back to you as soon as I can with follow-up questions/comments — Vizag, Aug 14 '18 at 07:31
@JesseKnight. I assure you I wouldn't be using MS anything if I didn't have to for money. Even then I struggle, but money is money :) — Mad Physicist, Dec 21 '19 at 01:26
@MadPhysicist I tried this out with the latest version of python-docx available at PyPI and it doesn't work for me. All four items appear at the same level instead of the expected indentation. Any idea why this could be the case? — Rohit Gavval, Nov 20 '20 at 09:06

Lee Thomas · Answer 2 · 2022-06-28T21:34:03.637

I found that @Mad Physicist's answer didn't work for me with indented bulleted lists. I modified it to only put in the value for numId if the boolean num was True - but that exposed that the get_abstract_id() function used "num" as its own local variable. So I changed "num" to "numbr" throughout that function, and added a boolean if to the next-to-last line:

if num:
    par._p.get_or_add_pPr().get_or_add_numPr().get_or_add_numId().val = numbr

So for me here's the whole function:

def get_abstract_id():
    """
    Select as follows:

        1. Match single-level by style (get min ID)
        2. Match exact style and level (get min ID)
        3. Match single-level decimal/bullet types (get min ID)
        4. Match decimal/bullet in requested level (get min ID)
        3. 0
    """
    for fn in (style_xpath, type_xpath):
        for prefer_single in (True, False):
            xpath = fn(prefer_single)
            ids = numbering.xpath(xpath)
            if ids:
                return min(int(x) for x in ids)
    return 0

if (prev is None or
        prev._p.pPr is None or
        prev._p.pPr.numPr is None or
        prev._p.pPr.numPr.numId is None):
    if level is None:
        level = 0
    numbering = doc.part.numbering_part.numbering_definitions._numbering
    # Compute the abstract ID first by style, then by num
    anum = get_abstract_id()
    # Set the concrete numbering based on the abstract numbering ID
    numbr = numbering.add_num(anum)
    # Make sure to override the abstract continuation property
    numbr.add_lvlOverride(ilvl=level).add_startOverride(1)
    # Extract the newly-allocated concrete numbering ID
    numbr = numbr.numId
else:
    if level is None:
        level = prev._p.pPr.numPr.ilvl.val
    # Get the previous concrete numbering ID
    numbr = prev._p.pPr.numPr.numId.val
if num: 
    par._p.get_or_add_pPr().get_or_add_numPr().get_or_add_numId().val = numbr
par._p.get_or_add_pPr().get_or_add_numPr().get_or_add_ilvl().val = level

With profound gratitude to Mad Physicist, scanny, and everyone else who has worked so hard on python-docx; you've been tremendous help!!!

EDIT: I should add that I also made use of scanny's suggestion to start from a document with the bullet styles I wanted, rather than from a blank document. In my template I was able to correct some issues with the bullets (some of which were set to numbers, incorrectly). I then save the result to my desired filename, and all works very well.

I've gone ahead and incorporated your change into the development branch of [haggis](https://haggis.readthedocs.io/en/latest/api.html#haggis.files.docx.list_number), and added a reference to it in my answer as well. Thanks for the fix! — Mad Physicist, Jun 29 '22 at 22:11

Pierre Bezukhov · Answer 3 · 2022-10-25T06:03:17.503

1

Bullet have unicode 9679, so the easyest way do that:

r.add_text("\n " + chr(9679) + " Item 1")

edited Oct 25 '22 at 06:03

answered Oct 25 '22 at 06:02

Pierre Bezukhov

11
2

score 0 · Answer 4 · answered Apr 12 '23 at 16:50

Quick but not perfect solution. I think that can be interesting for some people. You can just name the sub-list a different style:

from docx import Document

document = Document()

document.add_paragraph('Item 1', style='List Bullet')
document.add_paragraph('Sub-item 1', style='List Bullet 2')
document.add_paragraph('Item 2', style='List Bullet')
document.add_paragraph('Sub-item 2', style='List Bullet 2')

document.save('my_document.docx')

This gives the following output:

score 0 · Answer 5 · answered May 20 '23 at 17:10

I added a function on top of Mad Physicist's list_number function: (so everything below depends on his function)

def add_list(document, itemized_list):
    """ Add itemized list

    - supports nesting one level

    Arguments
    ---------
    document (docx.Document)
    itemized_list (list)
        List of strings
        Lists can be nested for multiple level lists
    """
    # Paragraphs in list
    p = []

    # First item in list
    p.append(document.add_paragraph(itemized_list[0], style='List Paragraph'))
    list_number(document, p[0], level=0, num=False)

    # Loop over remaining list items
    # First level
    for level1_item in itemized_list[1:]:
        if type(level1_item) == str:
            p.append(document.add_paragraph(level1_item, style='List Paragraph'))
            list_number(document, p[-1], prev=p[-2], level=0)
        elif type(level1_item) == list:
            # Go to second level
            for level2_item in level1_item:
                p.append(document.add_paragraph(level2_item, style='List Paragraph'))
                list_number(document, p[-1], prev=p[-2], level=1)

Usage:

itemized_list = ['item 1',
                 ['item 1, subitem 1'],
                 'item 2',
                 'item 3',
                 ['item 3, subitem 1',
                  'item 3, subitem 2']]
add_list(document, itemized_list)

Result:

item 1
- item 1, subitem 1
item 2
item 3
- item 3, subitem 1
- item 3, subitem 2

Can easily be extended to more levels if needed.

Bullet Lists in python-docx

5 Answers5

Linked