1

Is there an API in Windows that can crack a url into parts?

Background

The format of a URL is:

stackoverflow://iboyd:password01@mail.stackoverflow.com:12386/questions/SubmitQuestion.aspx?useLiveData=1&internal=0#nose
\___________/   \___/ \________/ \____________________/ \___/ \___________________________/\_______________________/ \__/
     |            |       |               |               |                |                          |                |
   scheme     username password        hostname          port             path                      query           fragment

Is there a function in (native) Win32 api that can crack a URL into parts:

  • Scheme: stackoverflow
  • Username: iboyd
  • Password: password01
  • Host name: mail.stackoverflow.com
  • Port: 12386
  • Path: questions/SubmitQuestion.aspx
  • Query: ?useLiveData=1&internal=0
  • Fragment: nose

Some functions don't work

There are some functions in WinApi, but they fail to do the job because they don't understand schemes except the ones that WinHttp can use:

both fail to understand urls such as:

  • ws://stackoverflow.com (web-socket)
  • wss://stackoverflow.com (web-socket secure)
  • sftp://fincen.gov/submit (SSL file transfer)
  • magnet:?xt=urn:btih:c4244b6d0901f71add9a1f9e88013a2fa51a9900
  • stratum+udp://blockchain.info

WinHttpCrackUrl actively prevents being used to crack URLs:

If the Internet protocol of the URL passed in for pwszUrl is not HTTP or HTTPS, then WinHttpCrackUrl returns FALSE and GetLastError indicates ERROR_WINHTTP_UNRECOGNIZED_SCHEME.

Is there another native API in Windows that can get parts of a url?

Bonus Chatter

Here's how you do it in CLR (e.g. C#): (fiddle)

using System;

public class Program
{
    public static void Main()
    {
        var uri = new Uri("stackoverflow://iboyd:password01@mail.stackoverflow.com:12386/questions/SubmitQuestion.aspx?useLiveData=1&internal=0#nose");

        Console.WriteLine("Uri.Scheme: "+uri.Scheme);
        Console.WriteLine("Uri.UserInfo: "+uri.UserInfo);
        Console.WriteLine("Uri.Host: "+uri.Host);
        Console.WriteLine("Uri.Port: "+uri.Port);
        Console.WriteLine("Uri.AbsolutePath: "+uri.AbsolutePath);
        Console.WriteLine("Uri.Query: "+uri.Query);
        Console.WriteLine("Uri.Fragment: "+uri.Fragment);
    }
}

Outputs

Uri.Scheme: stackoverflow
Uri.UserInfo: iboyd:password01
Uri.Host: mail.stackoverflow.com
Uri.Port: 12386
Uri.AbsolutePath: /questions/SubmitQuestion.aspx
Uri.Query: ?useLiveData=1&internal=0
Uri.Fragment: #nose
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219

2 Answers2

1

There are a number of functions available to native Windows developers:

Of these, InternetCrackUrl works.

URL_COMPONENTS components;
components.dwStructSize      = sizeof(URL_COMPONENTS);
components.dwSchemeLength    = DWORD(-1);
components.dwHostNameLength  = DWORD(-1);
components.dwUserNameLength  = DWORD(-1);
components.dwPasswordLength  = DWORD(-1);
components.dwUrlPathLength   = DWORD(-1);
components.dwExtraInfoLength = DWORD(-1);

if (!InternetCrackUrl(url, url.Length, 0, ref components)
    RaiseLastOSError();

String scheme   = StrLCopy(components.lpszScheme, components.dwSchemeLength);
String username = StrLCopy(components.lpszUserName, components.dwUserNameLength);
String password = StrLCopy(components.lpszPassword, components.dwPasswordLength);
String host     = StrLCopy(components.lpszHostName, components.dwHostNameLength);
Int32  port     = components.nPort;
String path     = StrLCopy(components.lpszUrlPath, components.dwUrlPathLength);
String extra    = StrLCopy(components.lpszExtraInfo, components.dwExtraInfoLength);

This means that

stackoverflow://iboyd:password01@mail.stackoverflow.com:12386/questions/SubmitQuestion.aspx?useLiveData=1&internal=0#nose

is parsed into:

  • Scheme: stackoverflow
  • Username: iboyd
  • Password: password01
  • Host: mail.stackoverflow.com
  • Port: 12386
  • Path: /questions/SubmitQuestion.aspx
  • ExtraInfo: ?useLiveData=1&internal=0#nose

Parsing ExtraInfo into query and fragment

It sucks that InternetCrackUrl doesn't make a distinction between:

?query#fragment

and just mashes them together as ExtraInfo:

  • ExtraInfo: ?useLiveData=1&internal=0#nose
    • Query: ?useLiveData=1&internal=0
    • Fragment: #nose

So we have to do some splitting if we want the ?query or the #fragment:

/*
   InternetCrackUrl returns ?query#fragment in a single combined extraInfo field.
   Split that into separate
      ?query
      #fragment
*/
String query = extraInfo;
String fragment = "";

Int32 n = StrPos("#", extraInfo);
if (n >= 1) //one-based string indexes
{
   query = extraInfo.SubString(1, n-1);
   fragment = extraInfo.SubString(n, MaxInt);
}

Giving us the final desired:

  • Scheme: stackoverflow
  • Username: iboyd
  • Password: password01
  • Host: mail.stackoverflow.com
  • Port: 12386
  • Path: /questions/SubmitQuestion.aspx
  • ExtraInfo: ?useLiveData=1&internal=0#nose
    • Query: ?useLiveData=1&internal=0
    • Fragment: #nose
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
  • Again, why not use the [Uri](https://learn.microsoft.com/en-us/uwp/api/windows.foundation.uri) class? Apparently, you don't care as much about a Windows API as you care about being available to a native Windows developer. That one is, although it is Windows 10 only. On the upside, it parses *query* and *fragment* into separate properties. – IInspectable Jun 11 '18 at 21:19
  • @IInspectable: Windows 10-only is still possibly a pretty major problem. (For instance, I still use Windows 7.) – Andreas Rejbrand Jun 11 '18 at 21:23
  • @AndreasRejbrand: Yes, possibly. And possibly it isn't. That's why I'm asking for clarification. – IInspectable Jun 11 '18 at 21:25
1

Is there an API in Windows that can crack a url into parts?

There is in Windows 10. The Uri class in the Windows Runtime is capable of decomposing a URI into its individual parts. This is not strictly part of the Windows API, but consumable by any Windows API application.

The following code illustrates its usage. It is written using the C++/WinRT language projection, requiring a C++17 compiler. If you cannot switch to a C++17 compiler, you can use the Windows Runtime C++ Template Library (WRL) instead to consume the Windows Runtime APIs.

#include <iostream>
#include <string>
#include <winrt/Windows.Foundation.h>

#pragma comment(lib, "WindowsApp.lib")

using namespace winrt;
using namespace Windows::Foundation;

int wmain(int argc, wchar_t* wargv[])
{
    if (argc != 2)
    {
        std::wcout << L"Usage:\n  UrlCracker <url>" << std::endl;
        return 1;
    }

    init_apartment();

    Uri const uri{ wargv[1] };
    std::wcout << L"Scheme: " << uri.SchemeName().c_str() << std::endl;
    std::wcout << L"Username: " << uri.UserName().c_str() << std::endl;
    std::wcout << L"Password: " << uri.Password().c_str() << std::endl;
    std::wcout << L"Host: " << uri.Host().c_str() << std::endl;
    std::wcout << L"Port: " << std::to_wstring(uri.Port()) << std::endl;
    std::wcout << L"Path: " << uri.Path().c_str() << std::endl;
    std::wcout << L"Query: " << uri.Query().c_str() << std::endl;
    std::wcout << L"Fragment: " << uri.Fragment().c_str() << std::endl;
}

This program digests any URI spelled out in the question. Using the input

stackoverflow://iboyd:password01@mail.stackoverflow.com:12386/questions/SubmitQuestion.aspx?useLiveData=1&internal=0#nose

produces the following output:

Scheme: stackoverflow
Username: iboyd
Password: password01
Host: mail.stackoverflow.com
Port: 12386
Path: /questions/SubmitQuestion.aspx
Query: ?useLiveData=1&internal=0
Fragment: #nose

Error handling has been omitted. In case the Uri c'tor is passed an invalid string, it throws an exception of type winrt::hresult_error. If you cannot use exceptions in your code, you can activate the type manually (e.g. using the WRL), and inspect the HRESULT return values instead.

IInspectable
  • 46,945
  • 8
  • 85
  • 181