52

Here is what I'm trying to do: given a Google document URL, I want to get the document ID to create a copy on Google Drive. I know I can achieve that by some regex or replacing on the URL, but as there are several different forms to represent the same document in a URL, I wanted to find a generic solution.

Currently, that's the best I could think:

function getFileIdFromUrl(url) {
  try {
    return getDocIdFromUrl(url);
  } catch (e) {
    return getSpreadsheetIdFromUrl(url);
  }
}

function getDocIdFromUrl(url) {
  var doc = null;
  try {
    doc = DocumentApp.openByUrl(url);
  } catch (e) {
    doc = DocumentApp.openByUrl(url + "/edit");
  }
  return doc.getId();
}

function getSpreadsheetIdFromUrl(url) {
  var spreadsheet = null;
  try {
    spreadsheet = SpreadsheetApp.openByUrl(url);
  } catch (e) {
    spreadsheet = SpreadsheetApp.openByUrl(url + "/edit");
  }
  return spreadsheet.getId();
}

function copy(url) { // may throw an exception if the URL is invalid or private
   var id = getFileIdFromUrl(url);
   var file = DriveApp.getFileById(id);
   file.makeCopy().setSharing(DriveApp.Access.ANYONE_WITH_LINK, DriveApp.Permission.VIEW);
}

The problem is that my solution only covers documents and spreadsheets, I would like to do the same with any uploaded file, for example:

https://docs.google.com/file/d/0B-FYu_D7D7x4REdtRVEzVH0eU0/edit

In short, I wanted something like that:

DriveApp.getFileByUrl(url).makeCopy();

Does anyone know if it's possible?

Any safe solution to extract the file ID from the file URL would fit as well for me.

Thanks

Rubén
  • 34,714
  • 9
  • 70
  • 166
Renato
  • 685
  • 1
  • 6
  • 11

12 Answers12

134

DriveApp is indeed missing a getFileByUrl (and also folder for that matter). You may want to open an enhancement request on Apps Script issue tracker.

But what I do on my scripts (since these openByUrl functions are somewhat new), is to get the id using a regex. Like this.

function getIdFromUrl(url) { return url.match(/[-\w]{25,}/); }

This regex works for any google url I've tried: Drive url for folders and files, Fusion Tables, Spreadsheets, Docs, Presentations, etc. It just looks for anything in a string that "looks like" a Google key. That is, any big enough string that has only (google key) valid characters in it.

Also, it works even if it receives the ID directly, instead of the URL. Which is useful when you're asking the link from the user, as some may paste the id directly instead of the url and it still works.

--edit

There are some other answers and comments that address some edge cases that I never encountered myself but might happen, like trying to get a folder-id on a nested folder URL, or when you have G-Suite domain that is 25+ characters long. For those cases, you might want to use a more strict regex.

From a quick look at the suggestions below I recommend the following /[-\w]{25,}(?!.*[-\w]{25,})/ because it is still very simple and should address these cases.

Henrique G. Abreu
  • 17,406
  • 3
  • 56
  • 65
  • I did some tests and it looks great! Thank you. Just a question: this number (25) is really the minimum length for google docs ids? – Renato May 31 '13 at 20:11
  • This is not documented anywhere. I just took a bunch of URL from various files, look at the shortest one and gave a discount just in case :) Also checked that there wasn't another part of the url remotely close to 25 straight chars (without a dot in between, etc), so it won't confuse with another part. – Henrique G. Abreu Jun 03 '13 at 11:53
  • 2
    Using [the regex suggested by aquadeep](http://stackoverflow.com/a/31201136/1404066) is less likely to make an unwanted match. Then use a (group) to extract only the ID portion of the regex: url.match(/\/d\/(.{25,})\//)[1]; – Kenigmatic Feb 11 '16 at 07:38
  • 1
    @Kenigmatic, an unwanted match seems very unlikely, and I more willing to trust 25 characters as a base length than I am that the ID will always start with a number. I also think we can trust that the ID will always be made up of digits, letters, and hyphens. Why use a dot? – Josh Jun 14 '19 at 23:49
  • Just discovered that your edit added an unwanted '$' in the last paragraph—not sure that's supposed to be there? And guess which one I copied and struggled with :). – dsl101 Apr 26 '21 at 13:32
  • Indeed, the idea for the $ was to get the last match of 25+ matches in the string, but that'll break if the url has extra things after the id, which is entirely possible. I've edited and replaced that with a lookahead – Henrique G. Abreu Oct 07 '21 at 15:43
  • SonarCloud warns about using regex with backtracking, which may lead to a performance issue called catastrophic backtracking. I'm no expert in regex so @HenriqueG.Abreu, is this regex prone to that issue? – Duc Nguyen Jan 07 '22 at 08:18
  • I guess it could be, if the negative lookahead is not optimized (on the last suggested regex in the end of my answer). I haven't tested that on Apps Script. If you're worried about that, I imagine limiting the input URL to some reasonable length should prevent the execution even if bad. It should be an easy test – Henrique G. Abreu Jan 09 '22 at 01:31
8

The url is something like this and file id is present in this pattern "/d/XXXXXXXX/" for almost all GoogleDrive/Docs links:
https://drive.google.com/file/d/0B3tB9BU9FRnpcTJmS2FoaktsQzA/view

Using below function, we can get the '/d/fileid/' and then truncate '/d/' from begining and '/' from end.

public static string getIdFromUrl(string url)
{
    Regex r = new Regex(@"\/d\/(.+)\/", RegexOptions.IgnoreCase);
    Match m = r.Match(url);
    return m.ToString().TrimStart('/', 'd').Trim('/');
}
aquadeep
  • 107
  • 1
  • 3
8

I don't have enough reputation to comment on the accepted answer, but the accepted answer from Henrique G. Abreu fails when a Drive URL contains the domain name and the domain name exceeds 25 characters (just found this out the hard way :)

It has been very reliable otherwise and I think is the most elegant and robust among those provided here.

So, expanding on the accepted answer, the following regex will get the last occurrence of a string of word characters or hyphens that is at least 25 characters long, that is immediately preceded by a character that isn't a non word character or a hyphen, and followed optionally by the same type of character, and whatever other junk might come at the end:

/.*[^-\w]([-\w]{25,})[^-\w]?.*/

This LOSES the characteristic of the accepted answer that it will work when passed an ID only, however that's not a use case I require. It works for all different types of Drive, Docs, Sheets URLs for both documents and folders that I've tested.

Iain Dooley
  • 308
  • 2
  • 9
5

An openByUrl method is now available in Google Apps Script.

See ref docs here for Sheets, here for Docs, here for Slides and here for Forms.

Because you wrote:

I want to get the document ID to create a copy on Google Drive

...am assuming you don't need the ID per se. After getting the sheet/doc/slide/form by URL, you can make a copy of it.

ADW
  • 4,177
  • 1
  • 14
  • 22
4

There are some more URL extensions not covered above which can contain IDs.

https://drive.google.com/drive/folders/ and https://drive.google.com/open?id= and https://drive.google.com/a/domain.edu.vn/folderview?id=

I thought I'd add my solution which builds on this idea, and covers the above two extensions, as well as the ones using /d/

function getIdFrom(url) {
  var id = "";
  var parts = url.split(/^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?/);
  if (url.indexOf('?id=') >= 0){
     id = (parts[6].split("=")[1]).replace("&usp","");
     return id;
   } else {
   id = parts[5].split("/");
   //Using sort to get the id as it is the longest element. 
   var sortArr = id.sort(function(a,b){return b.length - a.length});
   id = sortArr[0];
   return id;
   }
 }
Community
  • 1
  • 1
Jason Allshorn
  • 1,625
  • 1
  • 18
  • 27
3

I just wanted to add the function I created based on the two given answers, because neither was quite what I was looking for.

function templateIdFrom(url) {
  var parts = url.match(/\/d\/(.+)\//);
  if (parts == null || parts.length < 2) {
    return url;
  } else {
    return parts[1];
  }
}

This gets the part after /d/ and up until the next /, which is how the document URLs always contain their IDs. If no match is found for this, then we simply return the original param, which is assumed to be the ID.

Daniel Centore
  • 3,220
  • 1
  • 18
  • 39
3

The solution suggested by Henrique may not cover the scenario when a Google Drive file is shared by a Google Workspace user where the domain may be part of the file URL. If the domain name is long, the domain name gets captured instead of the file URL.

https://drive.google.com/a/thisisaverylongdomainname.org/file/d/1djf7XfuKx4Px55x7ahvMa5uznp3Ibe5vd7Y/view?usp=sharing

The File IDs generated by Google Drive do not contain a period (.) so this modified RegEx can prevent the capturing of domain names.

function getFileIdFromDriveUrl(url) {
  var match = url.match(/([a-z0-9_-]{25,})[$/&?]/i);
  return match ? match[1] : null;
}
Amit Agarwal
  • 10,910
  • 1
  • 32
  • 43
2

For Python:

For fixed lenght google drive IDs you can use this:

regex = "([\w-]){33}|([\w-]){19}"
match = re.search(regex,url)

Google drive use 33 chars for normal drives and 19 chars for team drives

Another approach without using fixed lenght but instead preceding patterns:

regex = "(?<=/folders/)([\w-]+)|(?<=%2Ffolders%2F)([\w-]+)|(?<=/file/d/)([\w-]+)|(?<=%2Ffile%2Fd%2F)([\w-]+)|(?<=id=)([\w-]+)|(?<=id%3D)([\w-]+)"

match = re.search(regex,url)
2

I have a sheet where I put a URL into a cell and have it pulled in the App Script to do something else (like creating docs inside of a folder with content from the sheet).

I just used simple str.split("/") to pull the ID out of it. Sometimes, if the URL includes /u/0/, I just move down the index a bit:

if (sheet.getRange("D2").getValue().split("/")[4] === "u") {
  folderId = sheet.getRange("D2").getValue().split("/")[7];
} else {
  folderId = sheet.getRange("D2").getValue().split("/")[5];
}

though it only works with the two given Google Drive URL formats. I haven't encountered many others but drive.google.com/drive/folders/#folderId and drive.google.com/drive/u/o/folders/#folderId.

Julia
  • 1,950
  • 1
  • 9
  • 22
lycheelichi
  • 185
  • 1
  • 2
  • 9
1

To extract id from url spreadsheets I use the code below. It works with google spreadsheet and Excel in Drive. Maybe works with other docs too.

function getIdSheetFromUrl_(url)
{
    var id = url.split('id=')[1];
    if(!id)
    {
        id = url.split('/d/')[1];
        id = id.split('/edit')[0]; // here we have the id
    }
    return DriveApp.getFileById(id);
}
1

How about this for Google Apps Script to convert google drive URL to ID?

function testConverter(){
 getFileIdFromUrl("https://drive.google.com/file/d/1NqU78123456IxsnEOMISfjoA3Rgz5gZS/view?usp=share_link");

}

function getFileIdFromUrl(strUrl){

var secondpart = strUrl.split("/d/")[1];
return secondpart.split("/")[0];

}
rgchandar
  • 21
  • 1
0

If you have the URL to the Google Drive file in a cell in your spreadsheet I believe you can use the following formula to extract the fileID:

=mid({cellAddress},33,33)

Example:

=mid(A2,33,33)

slothinker
  • 19
  • 1