1

I need to pull the Table of Contents from a readme file in a Github repository. I used the 'requests' module in python to pull the text from the readme, and now I'm trying to match the Table of Contents using regular expressions. Here's the code I have leading up to my question:

import requests
import os
import sys
import re

# Get readme page info via Github API.
rm_pg_url = ('https://api.github.com/repos/PillarOfSand/Projects/readme')
rm_pg = requests.get(rm_pg_url, timeout = 10)
rm_pg_content = rm_pg.json()

# Isolate download page. Get actual text from readme file.
download_url = rm_pg_content['download_url']
real_rm = requests.get(download_url, timeout = 10)
all_text = real_rm.text

toc_regex = re.compile(r'(?s)^## Table of Contents.*security\)$')
table_of_contents = toc_regex.search(all_text)

The last two lines are what I'm trying to get at specifically. The table_of_contents variable is type None, so the regular expression I'm using isn't matching anything. The text string I'm searching can be found at the following URL:

ReadME Text

So, my actual question is, where am I going wrong? How does my regular expression need to be adjusted to match the entire table of contents?

Thanks.

Sean
  • 11
  • 1

0 Answers0