Figured it out! The trick is not to try to read the word doc, but to instead convert to a format that python can process more easily:
- Open the .docx and click "Save As"... then select .txt
- This didn't work on it's own, I had to further select the sub options for UTF-8 encoding. See pic here, and note I'm doing this from a Mac so your save screen may look different.

Once that is done, you can then read the new file like a normal txt file, including the custom numbering, and then do whatever natural language processing you want on it to isolate the numbers. Here's my (overly simple) code:
# Create array called 'doc' where each item in array is line from word doc.
with open('TEST_WORD.txt', 'r') as f:
doc = f.readlines()
# Function to return just the first word from each line if it starts with a number or a bracket.
def return_first_number(string):
if string[0] in ['1', '2', '3', '4', '5', '6', '7', '8', '9', '[']:
return string.split()[0]
# Use function to create a list just of lines that start with a number.
cleaned = [return_first_number(item) for item in doc]
# Get rid of all the "None"s.
cleaned = [item for item in cleaned if item]
# Print final list
print(cleaned)
out: ['[0001]', '1960s', '[0002]', '1960s', '[0003]', '1960s', '[0004]', '1960s', '1.', '2.']
This is by no means complete for what you need. It has some false positives (e.g. a line that starts with 1960s referring to the year), and I only have a sample of the word doc so there's likely other edge cases too. But this should get you on the right path.
Good luck!