0

I have a working script that accesses the Github REST API (v3) to (a) submit a search query and get results, and then (b) iterate through search results to download individual source code files for more processing.

The original script uses Python + PyGithub, and I am trying to port it to Go + go-github.

Here is a snippet of the Python:

g = Github(base_url=GITHUB_ENTERPRISE_URL, login_or_token=token)

def search_github(keywords):
    result = g.search_code(keywords)
    for repo in result:
        response = requests.get(repo.download_url)
        matched = re.findall(regular_expression, response.text)
        for match in matched:
            print(match)

The go-github equivalent of g.search_code(keywords) works well, ... but I cannot find anything equivalent to repo.download_url. Here is what go-github provides for a Code Search Result, ... no "download url".

// CodeResult represents a single search result.
type CodeResult struct {
    Name        *string      `json:"name,omitempty"`
    Path        *string      `json:"path,omitempty"`
    SHA         *string      `json:"sha,omitempty"`
    HTMLURL     *string      `json:"html_url,omitempty"`
    Repository  *Repository  `json:"repository,omitempty"`
    TextMatches []*TextMatch `json:"text_matches,omitempty"`
}

I would be surprised if the go-github library is an incomplete implementation of the Github REST API, ... but I cannot find how to get the "download url" associated with a search result, so I can download the actual source code file?

Even after lots of googling, I was not able to find any example Go code that uses go-github and actually downloads search results files.

I am stuck. Any pointers appreciated.

go-github: https://github.com/google/go-github

PyGithub: https://github.com/PyGithub/PyGithub

Github REST API: https://docs.github.com/en/rest

David Jones
  • 2,139
  • 2
  • 19
  • 20

1 Answers1

0

A GitHub API search code is supposed to return an "html_url" field for each item found, like:

"html_url": "https://github.com/jquery/jquery/blob/825ac3773694e0cd23ee74895fd5aeb535b27da4/src/attributes/classes.js",

That alone should be enough to download a found item.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • This `html_url` is intended to be viewed by a human in a browser. It's shows the typical Github UI. The `download_url` provided by PyGithub makes it possible to download the raw file (only) and is suitable for an automated script. I cannot find the equivalent in `go-github` – David Jones Jul 19 '21 at 14:56
  • @DavidJones I don't find it either: you might need to post-process `html_url` in order to transform it into a `download_URL`-type raw download link. – VonC Jul 19 '21 at 15:19
  • The `download_URL` includes a TOKEN, so there is no way to post-process the `html_url` to generate a valid TOKEN that Github will accept. – David Jones Dec 21 '21 at 03:11