9

I do NOT want to check if the remote repository exists. I just want to test a string and return true if the string is in the valid format for a git repo.

I'm writing a groovy script and wish to do a quick check if a string represents a valid possible git repo.

For instance if the following strings are entered the test should return true:

http://example.com/my-project.git
file:///absolute/path/to/my-project.git
ssh:user@example.com:my-project
my-project

The following strings should fail the test and cause false to be returned:

fil://example.com/my-project.git
ssh:user|example.com:my-project

I'm hoping there is a git command that can do this quick test for me and I can call git from the groovy script. I say this because I'd like to use whatever is compiled into git to do the test as opposed to re-implementing the regular expression (or parser) that already exists in git. If I try the latter then inevitably I'll miss something.

Jason
  • 11,709
  • 9
  • 66
  • 82
  • 2
    Is there a reason attempting to clone the repo and handling the error if it fails wouldn't work? It seems like trying to figure out how to write a regex pattern to match all the possible forms of git repos is unnecessary and overly complex. – Douglas Adams Jun 01 '14 at 02:02
  • It does not look like git does strict url validation on entry. fil:// returns error that it can't find remote helper for "fil" and "ssh:user|example..." works just fine until error with "could not read...". In your case I'd search for URL details in git documentation and try parsing them yourself. For http url: https://github.com/git/git/blob/master/Documentation/technical/http-protocol.txt – mrówa Jun 01 '14 at 04:25
  • @DouglasAdams, I agree! It wasn't my decision to require this check. The folks over at Prezi decided checking the URL would enable supporting a feature in their tool called "Pride". But they did not support all possible URLs and I want to use an URL they don't support. So I'm looking to modify their code to support all valid git repo URLs. The solution I finally used is in my answer below and will have to do. – Jason Jun 01 '14 at 20:54
  • Yes, there's a reason not to attempt cloning the repo and handling the error if it doesn't exist. The string might be user submitted and be something like `https://foo/bar ; rm -rf /etc ; #` so if you execute the command `git ls $url` then your system is dead. Or so on. – Edward Ned Harvey Jan 09 '18 at 18:48
  • Related or duplicate: [Regular expression for git repository](https://stackoverflow.com/q/2514859/3744182). – dbc Jan 31 '20 at 18:11
  • ℹ️ For actually checking that the repository exists: https://stackoverflow.com/questions/23914896/check-that-git-repository-exists/69303006#69303006 – Alberto Salvia Novella Sep 23 '21 at 15:36

4 Answers4

14

It does seem that there is no way to accomplish this with git. The solution I finally used is

git ls-remote the-url-to-test

This returns zero on success and non-zero otherwise. This doesn't satisfy my original question since I don't want to check if the repo exists and is valid... but it does satisfy that the URL is valid if the repo also exists.

This will have to do for my current script.

Jason
  • 11,709
  • 9
  • 66
  • 82
  • 2
    Note that this tests that a repository is accessible, which means (1) it exists and (2) you have permissions to read it. This method would give false negatives if either of those conditions are not true, which makes it a poor choice for URI validation. It's also an expensive operation since it does a network request. – Dennis Mar 09 '16 at 18:54
  • @Dennis it may be a little more useful to use --exit-code option. – Nazım Gediz Aydındoğmuş Aug 31 '20 at 09:18
4

This gets close to Using a regular expression to validate an email address, in that you'll be able to detect some errors but not all. Given that you'll have to cope with failure anyway (wrong hostnames, bad SSH credentials), you're basically putting a few heuristics in place to catch common errors. I don't believe there's any validation code you can borrow from Git for this, although the list of URL formats (Git URLs) should be helpful in implementing this yourself.

Community
  • 1
  • 1
Joe
  • 29,416
  • 12
  • 68
  • 88
  • 3
    Yet, there is https://github.com/git/git/blob/master/test-urlmatch-normalization.c and its associated https://github.com/git/git/blob/master/t/t0110-urlmatch-normalization.sh – VonC Jun 01 '14 at 11:32
4

Try this regex:

/^([A-Za-z0-9]+@|http(|s)\:\/\/)([A-Za-z0-9.]+(:\d+)?)(?::|\/)([\d\/\w.-]+?)(\.git)?$/i
4b0
  • 21,981
  • 30
  • 95
  • 142
  • That's an ok regex, however it doesn't work for one usecase of bitbucket link and has suffix `.git` as optional - I think it should be required for `.git` to happen exactly once. I have posted an answer with the regex improved in this matter – P D Aug 06 '20 at 12:01
4

I've improved answer by Љубиша Ивановић and created a regex that:

  • works for url valid for bitbucket, with https and @, such as https://username@bitbucket.org/otherusername/reponame.git
  • requires the suffix .git to happen exactly once.

The regex:

/^(([A-Za-z0-9]+@|http(|s)\:\/\/)|(http(|s)\:\/\/[A-Za-z0-9]+@))([A-Za-z0-9.]+(:\d+)?)(?::|\/)([\d\/\w.-]+?)(\.git){1}$/i

Here you may see how the regex works, with test cases: https://regexr.com/59nrk

P D
  • 752
  • 1
  • 8
  • 13