0

I am doing some large scale analysis of Java projects on Github; this involves cloning the project via git (nothing special here), then doing a static read of the code for analysis, and so on.

My question: is there a way to programmatically recover the Github url for each source file in the cloned repository? I'm trying to get this so I can then link back into Github and see the original source.

For example, the following url (what I want): https://github.com/elastic/elasticsearch/blob/master/libs/elasticsearch-nio/src/main/java/org/elasticsearch/nio/BytesWriteOperation.java#L35 points to a particular function. Of course, the directory structure in the cloned version doesn't match the url name, e.g., the /blob/master/ seems particular to this project and structure of the repository.

kazimir.r
  • 387
  • 1
  • 3
  • 9
  • 1
    You may want to look at the `raw` version of the particular file. It should start with `https://raw.githubusercontent.com/something` – Ru Chern Chong Feb 12 '18 at 09:40
  • You can get the github url for cloned repository. https://stackoverflow.com/a/4089452/1562662 – Chacko Feb 12 '18 at 09:40

2 Answers2

1

So if I'm understanding correctly you want to create a link to a human readable code view with a specific line highlighted.

Essentially the normal GitHub link breaks down into:

  1. URL to the User Repo e.g. https://github.com/elastic/elasticsearch (aka the remote origin)
  2. /blob/
  3. the path to the file within the repo, which should be identical to the local repo you just cloned.

Each of these components are pretty simple to get.

1. Remote Origin
(aka <remote-origin>)

When you clone a repo, the local copy you create stores a reference to the remote origin URL.

$ git config --get remote.origin.url

https://github.com/elastic/elasticsearch

or, for more detail:

$ git remote show origin
* remote origin
  Fetch URL: https://github.com/elastic/elasticsearch
  Push  URL: https://github.com/elastic/elasticsearch
  HEAD branch: develop
  Remote branches:
    1.4                    new (next fetch will store in remotes/origin)
    2.0                    tracked
    add-code-of-conduct-1  tracked
    build_test             tracked
    ...

and so on

2. Branch
(aka <branch>)

You're no doubt aware of how to get the repo's branches:

git branch
  develop
* master

but you can also use git rev-parse --abbrev-ref HEAD to return just the current branch.

git rev-parse --abbrev-ref HEAD
master

From those pieces of information and the file path within the local copy of the repo you can build the link

<remote-origin>/blob/<branch>/<path-to-file-in-repo>#L<line_number>
Craig
  • 9,335
  • 2
  • 34
  • 38
0

As suggested you can use githubusercontent: Syntax should be:

https://raw.githubusercontent.com/<user>/<repo>/<branch>/<file>
ErniBrown
  • 1,283
  • 12
  • 25
  • thanks for the pointer! That's quite useful as a first step, but is there a way to then link back to the non raw file on Github? – kazimir.r Feb 12 '18 at 10:26