1

I have an application server built as UNICODE running on Windows only. The application accepts lots of clients using multithreading.

Part of the application is responsible for logging connections, to log IPv4 addresses I convert them to UNICODE from ANSI. I achieve this using inet_ntoa + MultiByteToWideChar.

My frustration is that there is no UNICODE version of inet_ntoa and I am forced to convert each IPv4 to UNICODE. Although MultiByteToWideChar is relatively fast, it is still a performance loss and prone to failure, especially when you have 100,000 connections coming in.

Researching a bit I have found functions like RtlIpv4AddressToStringExW, however I do not have experience with these and applying it to real life applications can be fatal failure without proper testing.

Investigating the above function I have seen that there is no call to MultiByteToWideChar, but instead to RtlIpv4AddressToStringW and swprintf_s, which tells me I really don't need to convert anything (?).

Please advise what is the best way to obtain the readable UNICODE version of IN_ADDR.

Any advice is much appreciated.

Update:

  • I investigated inet_ntoa and there is quite a lot of code there, retrieving stuff from TLS among other stuff. Overkill from my opinion.

  • Also investigated WSAAddressToString as proposed by @Alex Guteniev. Again a ton of code, for what purpose...

  • Also investigated RtlIpv4AddressToStringExW, and my findings worry me again how Windows API's are designed and offered.

Windows Implementation:

LONG __stdcall RtlIpv4AddressToStringExW(const struct in_addr *Address, USHORT Port, PWSTR AddressString, PULONG AddressStringLength)
{
  PULONG v4; // rdi
  PWSTR v5; // rbp
  USHORT v6; // si
  wchar_t *v7; // rax
  signed __int64 v8; // rbx
  ULONG v9; // ebx
  LONG result; // eax
  WCHAR S; // [rsp+20h] [rbp-68h]
  char v12[4]; // [rsp+4Ch] [rbp-3Ch]

  v4 = AddressStringLength;
  v5 = AddressString;
  v6 = Port;
  if ( Address && AddressStringLength && (AddressString || !*AddressStringLength) )
  {
    v7 = RtlIpv4AddressToStringW(Address, &S);
    v8 = (signed __int64)v7;
    if ( v6 )
      v8 = (signed __int64)&v7[swprintf_s(v7, (v12 - (char *)v7) >> 1, L":%u", (unsigned __int16)__ROR2__(v6, 8))];
    v9 = (unsigned __int64)((v8 - (signed __int64)&S) >> 1) + 1;
    if ( *v4 >= v9 )
    {
      memmove(v5, &S, 2i64 * v9);
      result = 0;
      *v4 = v9;
      return result;
    }
    *v4 = v9;
  }
  return -1073741811;
}

PWSTR __stdcall RtlIpv4AddressToStringW(const struct in_addr *Addr, PWSTR S)
{
  __int64 v3; // [rsp+20h] [rbp-28h]
  __int64 v4; // [rsp+28h] [rbp-20h]
  __int64 v5; // [rsp+30h] [rbp-18h]

  LODWORD(v5) = Addr->S_un.S_un_b.s_b4;
  LODWORD(v4) = Addr->S_un.S_un_b.s_b3;
  LODWORD(v3) = Addr->S_un.S_un_b.s_b2;
  return &S[swprintf_s(S, 0x10ui64, L"%u.%u.%u.%u", Addr->S_un.S_un_b.s_b1, v3, v4, v5)];
}

RectOS Implementation:

 NTSTATUS
 NTAPI
 RtlIpv4AddressToStringExW(
     _In_ const struct in_addr *Address,
     _In_ USHORT Port,
     _Out_writes_to_(*AddressStringLength, *AddressStringLength) PWCHAR AddressString,
     _Inout_ PULONG AddressStringLength)
 {
     WCHAR Buffer[IPV4_ADDR_STRING_MAX_LEN + IPV4_PORT_STRING_MAX_LEN];
     NTSTATUS Status;
     ULONG Length;
     PWSTR End;
 
     if (!Address || !AddressString || !AddressStringLength)
         return STATUS_INVALID_PARAMETER;
 
     Status = RtlStringCchPrintfExW(Buffer,
                                    RTL_NUMBER_OF(Buffer),
                                    &End,
                                    NULL,
                                    0,
                                    Port ? L"%u.%u.%u.%u:%u"
                                         : L"%u.%u.%u.%u",
                                    Address->S_un.S_un_b.s_b1,
                                    Address->S_un.S_un_b.s_b2,
                                    Address->S_un.S_un_b.s_b3,
                                    Address->S_un.S_un_b.s_b4,
                                    WN2H(Port));
     ASSERT(Status == STATUS_SUCCESS);
     if (!NT_SUCCESS(Status))
         return STATUS_INVALID_PARAMETER;
 
     Length = End - AddressString;
     if (*AddressStringLength > Length)
     {
         Status = RtlStringCchCopyW(AddressString,
                                    *AddressStringLength,
                                    Buffer);
         ASSERT(Status == STATUS_SUCCESS);
         *AddressStringLength = Length + 1;
         return STATUS_SUCCESS;
     }
 
     *AddressStringLength = Length + 1;
     return STATUS_INVALID_PARAMETER;
 }

My opinion here is that the ReactOS implementation is FAR better than the one in Windows. I believe RtlStringCchPrintfExW is much safer than swprintf_s.

In the end, is it safe for me (and anyone else) to create their own implementation of formatting?

Mecanik
  • 1,539
  • 1
  • 20
  • 50
  • This is a [solved problem](https://en.cppreference.com/w/cpp/string/basic_string/to_wstring), unless I'm missing what you're trying to accomplish. Indeed, that solved problem just [got solved](https://en.cppreference.com/w/cpp/utility/format/format) again. – IInspectable Sep 12 '21 at 13:23
  • @IInspectable I just can't figure out if you are being sarcastic or just funny. – Mecanik Sep 12 '21 at 16:31
  • I cannot figure out what sort of research would need to go into solving the trivial problem of converting 4 numbers in the range 0..255 to their wide character string representation and joining them with a dot character. – IInspectable Sep 12 '21 at 20:15
  • How is MultiByteToWideChar prone to failure on a simple ASCII string? And it is of course overkill, just use your own little loop to copy from char to WCHAR if you don't feel like converting the numbers to a string. yourself – Anders Sep 13 '21 at 00:03
  • @IInspectable If you are unwilling to answer the question or provide any valuable feedback or advice, please do not leave sarcastic comments. – Mecanik Sep 13 '21 at 05:21
  • @Anders This is my point, documentation is not clear and whilst there are good and simple functions to do this the first thing that comes up when searching is inet_ntoa or WSAAddressToStringW (which interally uses MultiByteToWideChar btw). I ended up using just RtlIpv4AddressToStringW since it's very simple and offers much better performance than converting stuff. The reason I did not post an answer yet is to see if someone has feedback, advice or why not something better. – Mecanik Sep 13 '21 at 05:23
  • Clearly, you aren't interested in feedback, if that feedback suggests that your problem has a trivial solution. – IInspectable Sep 13 '21 at 06:11

1 Answers1

2

Use WSAAddressToStringW if you want a generic function to convert addresses.

These Windows API are not necessarily optimized for throughput, instead they probably handle different addresses type.

If you are limited to specific address types, like IPv4 only, you can probably create your own implementation, if performance that much matters.

Consider SIMD (SSE, AVX) if you need top performance. If you just need a simple implementation, consider just using swprintf (or calling _itow few times if you mind format specification parsing overhead)

Alex Guteniev
  • 12,039
  • 2
  • 34
  • 79
  • Thanks, after investigating this is a pretty huge function too. Similar to inet_ntoa. I will update my question with some of my findings. – Mecanik Sep 11 '21 at 18:04
  • @Norbert, I think if performance matters, you shouljust rollyou own implementation using SSE/AVX, or find such implementation. Here's an example of reverse task: https://stackoverflow.com/a/31683632/2945027 – Alex Guteniev Sep 11 '21 at 18:10
  • Thank you - using SSE/AVX is not ideal for me. All I want is a simple, fast and reliable way of obtaining the UNICODE string representation of an IPv4. Please see my updated post and tell me your thoughts - appreciate the feedback! – Mecanik Sep 11 '21 at 18:15
  • 1
    As I told, I think if you know that your addresses are always IPv4 addresses, having your own implementation is safe. Using API means safety from corner cases, but what corner cases can be there if you always have 4 bytes for address, separated by dot, and optionally colon and port – Alex Guteniev Sep 11 '21 at 18:30