5

I need to extract the directory and file name in a different input of user URL's.

Some examples would include:

What I really need is the TOP_PROD_IMAGE and WS-25612-BK_IMRO_1.jpg file name.

So I would need to account for users who enter http:// or https:// or just www. so I tried using string.split('/') but that obviously wouldn't work in all cases. Is there something that could give me an array despite the double // in cases where user enters http? Thanks!

C.OG
  • 6,236
  • 3
  • 20
  • 38
rec0nstr
  • 185
  • 8
  • I'd use path-to-regexp for this. it's used within Express internally, and can be quite robust. https://www.npmjs.com/package/path-to-regexp If this really is just a one-off use case though, you could do it directly with regex. – Brad Dec 23 '19 at 23:08
  • `([^/]+)\/([^/]+)$` as a regexp? – vsh Dec 23 '19 at 23:09
  • This question reminded me of a similar one [here](https://stackoverflow.com/a/45075028/4003419) But this one should be a lot easier. – LukStorms Dec 23 '19 at 23:11

4 Answers4

6

Consider:

const [file, folder] = url.split('/').reverse();

With this you wouldn't need to consider http:// or any //

C.OG
  • 6,236
  • 3
  • 20
  • 38
4

How about:

const url = new URL('https://foo/s3.amazonaws.com/TOP_PROD_IMAGE/WS-25612-BK_IMRO_1.jpg')
const urlParams = url.pathname.split('/') // you'll get array here, so inspect it and get last two items

Will this do the trick? You'll get exactly what you need within the pathname.

dvlden
  • 2,402
  • 8
  • 38
  • 61
0

If the urls have to start with either http and optional s or www. you could also use a pattern with 2 capturing groups to get the part before the last slash and the part after the last slash.

^(?:https?:\/\/|www\.)\S+\/([^/]+)\/(\S+)$

Regex demo

urls = [
  "https://foo/s3.amazonaws.com/TOP_PROD_IMAGE/WS-25612-BK_IMRO_1.jpg",
  "http://192.168.12.44:8090/TOP_PROD_IMAGE/R3CRDT-HZWT_IMRO_1.jpg",
  "www.foobar-images.s3.amazonaws.com/TOP_PROD_IMAGE/WS-25612-BK_IMRO_1.jpg"
].forEach(s => {
  let m = s.match(/^(?:https?:\/\/|www\.)\S+\/([^/]+)\/(\S+)$/, s);
  console.log(m[1]);
  console.log(m[2]);
  console.log("\n");
});
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

You can use negative look-aheads to only match the final URI segments:

/(?!([https?:\/\/]|[www.]))(?!([\d]))(?!(.*[com])).*/

const re = /(?!([https?:\/\/]|[www.]))(?!([\d]))(?!(.*[com])).*/
const arr = [
  "https://foo/s3.amazonaws.com/TOP_PROD_IMAGE/WS-25612-BK_IMRO_1.jpg",
  "http://192.168.12.44:8090/TOP_PROD_IMAGE/R3CRDT-HZWT_IMRO_1.jpg",
  "www.foobar-images.s3.amazonaws.com/TOP_PROD_IMAGE/WS-25612-BK_IMRO_1.jpg"
]

const res = arr.map(str => re.exec(str)[0].split("/"))

console.log(res)
symlink
  • 11,984
  • 7
  • 29
  • 50