0

I'm automatically looking for various properties of README.md files.

import requests
from typing import Final

FILE_NOT_FOUND: Final[int] = 404
GITHUB_URL: Final[str] = "https://github.com/"

repo_name_1 = "FasterXML/jackson-databind" # <--- main_name = 2.14
repo_name_2 = "Cacti/cacti"                # <--- main_name = develop
repo_name_3 = "FFmpeg/FFmpeg"              # <--- main_name = master

main_name = "develop"
repo_name = repo_name_2

url = f"{GITHUB_URL}/{repo_name}/blob/{main_name}/README.md"
if requests.head(url).status_code != FILE_NOT_FOUND:
    print(f"README.md of {repo_name} exists")
    # do something with README.md ...

I need to somehow put a regex instead of main_name, is that doable?

EDIT:

Put in other words, can one use requests to do the equivalent of find:

find https://github.com/FasterXML/jackson-databind -name "README.md"
OrenIshShalom
  • 5,974
  • 9
  • 37
  • 87
  • What do you mean by put a regex here? You are constructing a string here, not extracting parts of it. What would be this regex pattern and what it would be matched against? – matszwecja May 09 '22 at 14:50
  • I guess any sequence without `/` would be good for me - is it possible to do that? – OrenIshShalom May 09 '22 at 14:54
  • There is infinitely many such sequences. – matszwecja May 09 '22 at 14:55
  • 1
    When you perform the `requests.head()` method you have to tell it what URL to go after. You can't tell it to go after any URL that takes a particular pattern. It would be like telling your web browser that you want to go to `stackover[a-zA-Z0-9]low.com`. What would it do? Visit 62 different websites and bring them all back in different tabs? What would that even mean? If you need to check if N combinations of a particular URL exist, you will have to ping N URLs. There's no getting around that. – JNevill May 09 '22 at 14:56
  • Furthemore, this is an odd backwards usecase for regex. Regex is a pattern that defines a set of strings, but it doesn't generate that set of strings, instead it is used to test an already defined string to see if it matches the pattern. In your case you want to [generate](https://stackoverflow.com/questions/4627464/generate-a-string-that-matches-a-regex-in-python) a set of strings from the pattern. Clearly folks have done this, but it's odd and feels like an anti-pattern as many regex patterns would yield an infinite set. – JNevill May 09 '22 at 15:02
  • If all you want to do is remove slash use `replaced_text = text.replace('/', '')` –  May 09 '22 at 15:14

1 Answers1

0

After sorting XY problem of regex being completely irrelevant: What you are looking for is a GitHub API, which offers, among others, exactly this information. If you check out response for this request: https://api.github.com/repos/FasterXML/jackson-databind you can see that in the returned JSON one of the parameters is "default_branch" containing exactly what you need.

matszwecja
  • 6,357
  • 2
  • 10
  • 17