I would recommend you to use existing Uri
class which provides easy access to parts of uri. Some of urls in your sample list don't have scheme, so you just need to add it manually:
Uri uri = new Uri(url.StartsWith("http") ? url : "http://" + url);
Now you can use Uri.Host
to get host of uri. For you sample input hosts will be
"domain.com"
"domain.com"
"www.domain.com"
"www.domain.com"
"domain.com"
"domain.com"
"domain.com"
"domain.com"
"domain.com"
You can do simple string replace to get rid of www
part:
uri.Host.Replace("www.", "")
Next goes query parameters. You can get them from Url.Query
. In your sample input only one url has query parameters. Returned value will be
?arg=123&arg2=abc
Again, it's easy to get rid of starting ?
:
uri.Query.TrimStart('?') // arg=123&arg2=abc
Uri also has Segments
collection which will contain array of segments. You can check if last segment contains .
to get next result:
uri.Segments.Last().Contains('.') ? uri.Segments.Last() : ""
If this is true, then you will get page.html
in last segment. Output:
""
""
""
""
""
""
"page.html"
"page.html"
"page.html"
You also can use simple String.Join
to concatenate other segments into string. Or you can do string replace on Uri.LocalPath
:
uri.Segments.Last().Contains('.') ?
uri.LocalPath.Replace(uri.Segments.Last(), "") : uri.LocalPath;
Output:
""
""
""
""
""
"/catalog/nextcatalog/"
"/catalog/nextcatalog/"
"/"
"/"
All you need to do is TrimStart
to get rid of slash.