1

I've tried to recreate Windows Reg.exe utility in C++. Specifically, the functionality that comes with completing the following command in REG.exe.

REG QUERY "HKLM\Software" /s

However, I seem to get a severe bottleneck when I output the results to screen.

If I comment out std::wcout << fPath.c_str(), the program completes within 1.45 minutes as opposed to over 20 minutes with it commented in.

I want to know why that is the case and how can I resolve it.

inline void show(HKEY aHkey, std::wstring aHkeyPath, std::wstring aSubKey, RegValue &aValueData, bool aDisplayPath, bool aDisplayValue, bool aLastItem)
{
    if (aDisplayValue)
    {
        // Show registry value name
        std::wstring lValueName = (aValueData.lValueName);
        if (lValueName == TEXT("")) lValueName = TEXT("(Default)");

        // Recover from non ascii characters
        if (!std::wcout.good())
        {
            std::wcout.clear();
        }

        // Show registry type
        std::wstring lDataType = convertToWstr(getDataTypeStringName(aValueData.lRegType));
        std::wstring lDataValue;

        // Show registry data
        if (lDataType == L"REG_DWORD" ||
            lDataType == L"REG_QWORD" ||
            lDataType == L"REG_DWORD_LITTLE_ENDIAN" ||
            lDataType == L"REG_QWORD_LITTLE_ENDIAN" ||
            lDataType == L"REG_DWORD_BIG_ENDIAN")
        {
            lDataValue = L"0x0" + (aValueData.lDataValue);
        }
        else
        {
            lDataValue = (aValueData.lDataValue);
        }


        std::wstring fPath = lValueName + L"    " + lDataType + L"    " + lDataValue;

        std::wcout << fPath.c_str();
    }
    std::cout << "\n";
}

Reg Value Structure

   struct RegValue
   {
       std::wstring lValueName;
       unsigned int lRegType;
       std::wstring lDataValue;
   };

Main body calling function

RegValue lRegValueData;
std::wstring lCurrentSubKey = L"";
std::wstring lCurrentValue = L"";
unsigned int lMatchTotal = 0;

// HACK: Makes sure that lSubKeyList loop will be entered even if there is no sub keys
// Additionally, allows for the last subkey within the dequeue to be shown
lSubKeyList.push_back(L"End");

while (lSubKeyList.size())
{
    //show(hkey, lHkeyPath, lCurrentSubKey, lRegValueData, true, false, false);

    while (lValueList.size())
    {
        lCurrentValue = lValueList.front();
        lValueList.pop();
        lResult = getValueData(hkey, lHkeyPath, lCurrentSubKey, lCurrentValue, lRegValueData);

        if (lResult != ERROR_SUCCESS)
        {
            error(lResult);
            return false;
        }

        if (lFind || lFilterDataType)
        {
            bool lMatch = isMatch(lRegValueData, aVal.lDataType, lSearchParam);

            if (lMatch)
            {
                show(hkey, lHkeyPath, lCurrentSubKey, lRegValueData, false, true, false);
                lMatchTotal++;
            }

        }
        else
        {
            show(hkey, lHkeyPath, lCurrentSubKey, lRegValueData, false, true, false);

        }

        //Create a space after last data value
        if (lValueList.size() == 0 && !lSearchRecursive)
        {
            std::cout << "\n";
        }

    }

    // Remove visted Sub-Key from deque 
    lCurrentSubKey = lSubKeyList.front();
    lSubKeyList.pop_front();

    if (lSearchRecursive && lCurrentSubKey != L"End")
    {
        //Create a space after the first key.
        std::cout << "\n";

        // If the parent key contains sub-keys, add them to lSubKeyList, 
        // in the order they were retrieved from RegEnumEx.
        std::deque<std::wstring> lTemp;
        lResult = getSubKeyList(hkey, lHkeyPath, lCurrentSubKey, lTemp);

        while (lTemp.size())
        {
            lSubKeyList.push_front(lTemp.back());
            lTemp.pop_back();
        }

        // A non-error. Error: "Invalid Handle"
        // This error code just states that the current key being accessed 
        // does not contain any sub-keys.
        if (lResult == 6) lResult = 0;

        if (lResult != ERROR_SUCCESS)
        {
            error(lResult);
            return false;
        }

        // Get the current keys value names
        lResult = getValueNameList(hkey, lHkeyPath, lCurrentSubKey, lValueList);

        // Ignore key values that cannot be accessed. Error "Access Denied" 
        // Keys cannot be accessed in regedit (admin), so this is a UAC issue.
        if (lResult == 5) lResult = 0;

        if (lResult != ERROR_SUCCESS)
        {
            error(lResult);
            return false;
        }
    }
}

Disassembly Code (std::wcout << fPath.c_str())

        std::wcout << fPath.c_str();
00007FF67E5759E3  lea         rdx,[rbp]  
00007FF67E5759E7  cmp         qword ptr [rbp+18h],8  
00007FF67E5759EC  cmovae      rdx,qword ptr [rbp]  
00007FF67E5759F1  mov         rcx,qword ptr [__imp_std::wcout (07FF67E57D140h)]  
00007FF67E5759F8  call        std::operator<<<wchar_t,std::char_traits<wchar_t> > (07FF67E577CE0h)  
00007FF67E5759FD  nop  
    }
00007FF67E5759FE  mov         rax,qword ptr [rbp+18h]  
00007FF67E575A02  cmp         rax,8  
00007FF67E575A06  jb          show+4B8h (07FF67E575A68h)  
00007FF67E575A08  inc         rax  
00007FF67E575A0B  mov         rcx,qword ptr [rbp]  
00007FF67E575A0F  cmp         rax,r13  
00007FF67E575A12  jbe         show+46Bh (07FF67E575A1Bh)  
00007FF67E575A14  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575A1A  int         3  
00007FF67E575A1B  add         rax,rax  
00007FF67E575A1E  cmp         rax,1000h  
00007FF67E575A24  jb          show+4B3h (07FF67E575A63h)  
00007FF67E575A26  test        byte ptr [rbp],1Fh  
00007FF67E575A2A  je          show+483h (07FF67E575A33h)  
00007FF67E575A2C  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575A32  int         3  
00007FF67E575A33  mov         rax,qword ptr [rcx-8]  
00007FF67E575A37  cmp         rax,rcx  
00007FF67E575A3A  jb          show+493h (07FF67E575A43h)  
00007FF67E575A3C  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575A42  int         3  
00007FF67E575A43  sub         rcx,rax  
00007FF67E575A46  cmp         rcx,8  
00007FF67E575A4A  jae         show+4A3h (07FF67E575A53h)  
00007FF67E575A4C  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575A52  int         3  
00007FF67E575A53  cmp         rcx,27h  
00007FF67E575A57  jbe         show+4B0h (07FF67E575A60h)  
00007FF67E575A59  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575A5F  int         3  
00007FF67E575A60  mov         rcx,rax  
00007FF67E575A63  call        operator delete (07FF67E57B430h)  
00007FF67E575A68  mov         qword ptr [rbp+18h],7  
00007FF67E575A70  mov         qword ptr [rbp+10h],r12  
00007FF67E575A74  mov         word ptr [rbp],r12w  
00007FF67E575A79  mov         rax,qword ptr [rbp-28h]  
00007FF67E575A7D  cmp         rax,8  
00007FF67E575A81  jb          show+533h (07FF67E575AE3h)  
00007FF67E575A83  inc         rax  
00007FF67E575A86  mov         rcx,qword ptr [rbp-40h]  
00007FF67E575A8A  cmp         rax,r13  
00007FF67E575A8D  jbe         show+4E6h (07FF67E575A96h)  
00007FF67E575A8F  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575A95  int         3  
00007FF67E575A96  add         rax,rax  
00007FF67E575A99  cmp         rax,1000h  
00007FF67E575A9F  jb          show+52Eh (07FF67E575ADEh)  
00007FF67E575AA1  test        byte ptr [rbp-40h],1Fh  
00007FF67E575AA5  je          show+4FEh (07FF67E575AAEh)  
00007FF67E575AA7  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575AAD  int         3  
00007FF67E575AAE  mov         rax,qword ptr [rcx-8]  
00007FF67E575AB2  cmp         rax,rcx  
00007FF67E575AB5  jb          show+50Eh (07FF67E575ABEh)  
00007FF67E575AB7  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575ABD  int         3  
00007FF67E575ABE  sub         rcx,rax  
00007FF67E575AC1  cmp         rcx,8  
00007FF67E575AC5  jae         show+51Eh (07FF67E575ACEh)  
00007FF67E575AC7  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
    }
00007FF67E575ACD  int         3  
00007FF67E575ACE  cmp         rcx,27h  
00007FF67E575AD2  jbe         show+52Bh (07FF67E575ADBh)  
00007FF67E575AD4  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575ADA  int         3  
00007FF67E575ADB  mov         rcx,rax  
00007FF67E575ADE  call        operator delete (07FF67E57B430h)  
00007FF67E575AE3  mov         qword ptr [rbp-28h],7  
00007FF67E575AEB  mov         qword ptr [rbp-30h],r12  
00007FF67E575AEF  mov         word ptr [rbp-40h],r12w  
00007FF67E575AF4  mov         rax,qword ptr [rbp-48h]  
00007FF67E575AF8  cmp         rax,8  
00007FF67E575AFC  jb          show+5AEh (07FF67E575B5Eh)  
00007FF67E575AFE  inc         rax  
00007FF67E575B01  mov         rcx,qword ptr [rbp-60h]  
00007FF67E575B05  cmp         rax,r13  
00007FF67E575B08  jbe         show+561h (07FF67E575B11h)  
00007FF67E575B0A  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575B10  int         3  
00007FF67E575B11  add         rax,rax  
00007FF67E575B14  cmp         rax,1000h  
00007FF67E575B1A  jb          show+5A9h (07FF67E575B59h)  
00007FF67E575B1C  test        byte ptr [rbp-60h],1Fh  
00007FF67E575B20  je          show+579h (07FF67E575B29h)  
00007FF67E575B22  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575B28  int         3  
00007FF67E575B29  mov         rax,qword ptr [rcx-8]  
00007FF67E575B2D  cmp         rax,rcx  
00007FF67E575B30  jb          show+589h (07FF67E575B39h)  
00007FF67E575B32  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575B38  int         3  
00007FF67E575B39  sub         rcx,rax  
00007FF67E575B3C  cmp         rcx,8  
00007FF67E575B40  jae         show+599h (07FF67E575B49h)  
00007FF67E575B42  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575B48  int         3  
00007FF67E575B49  cmp         rcx,27h  
00007FF67E575B4D  jbe         show+5A6h (07FF67E575B56h)  
00007FF67E575B4F  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575B55  int         3  
00007FF67E575B56  mov         rcx,rax  
00007FF67E575B59  call        operator delete (07FF67E57B430h)  
00007FF67E575B5E  mov         qword ptr [rbp-48h],7  
00007FF67E575B66  mov         qword ptr [rbp-50h],r12  
00007FF67E575B6A  mov         word ptr [rbp-60h],r12w  
00007FF67E575B6F  mov         rax,qword ptr [rbp-8]  
00007FF67E575B73  cmp         rax,8  
00007FF67E575B77  jb          show+629h (07FF67E575BD9h)  
00007FF67E575B79  inc         rax  
00007FF67E575B7C  mov         rcx,qword ptr [rbp-20h]  
00007FF67E575B80  cmp         rax,r13  
00007FF67E575B83  jbe         show+5DCh (07FF67E575B8Ch)  
00007FF67E575B85  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575B8B  int         3  
00007FF67E575B8C  add         rax,rax  
00007FF67E575B8F  cmp         rax,1000h  
00007FF67E575B95  jb          show+624h (07FF67E575BD4h)  
00007FF67E575B97  test        byte ptr [rbp-20h],1Fh  
00007FF67E575B9B  je          show+5F4h (07FF67E575BA4h)  
00007FF67E575B9D  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575BA3  int         3  
00007FF67E575BA4  mov         rax,qword ptr [rcx-8]  
00007FF67E575BA8  cmp         rax,rcx  
00007FF67E575BAB  jb          show+604h (07FF67E575BB4h)  
00007FF67E575BAD  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575BB3  int         3  
00007FF67E575BB4  sub         rcx,rax  
00007FF67E575BB7  cmp         rcx,8  
00007FF67E575BBB  jae         show+614h (07FF67E575BC4h)  
00007FF67E575BBD  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575BC3  int         3  
00007FF67E575BC4  cmp         rcx,27h  
00007FF67E575BC8  jbe         show+621h (07FF67E575BD1h)  
00007FF67E575BCA  call        qword ptr [__imp__invalid_parameter_noinfo_noreturn (07FF67E57D318h)]  
00007FF67E575BD0  int         3  
00007FF67E575BD1  mov         rcx,rax  
00007FF67E575BD4  call        operator delete (07FF67E57B430h)  
user36278
  • 230
  • 1
  • 10
  • 1
    Does this function get called in a loop? You should add the code that calls it. – 1201ProgramAlarm Jul 15 '18 at 01:57
  • As an aside, you probably shouldn't pass the string arguments of the function by value (`aHkeyPath` and 'aSubKey'). Registry keys aren't known to be small enough to activate SSO, so you're making unnecessary copies. EDIT: now that I've studied the function more, you aren't even *using* those two arguments. – Casey Jul 15 '18 at 02:03
  • @1201ProgramAlarm Yes, it does. Also, I've included the code. calling the function "show". – user36278 Jul 15 '18 at 02:04
  • @Casey I include the aHkeyPath and the aSubKey path so that the function show can display tthe registry path. (code is not shown in the snippit of function 'show'). E.g HKEY_LOCAL_MACHINE\Software\MyCo\ – user36278 Jul 15 '18 at 02:07
  • Stupid question: Are you compiling in Release with full optimization? – Casey Jul 15 '18 at 02:11
  • @Casey Yes, all the tests I've done have been using Release X64. – user36278 Jul 15 '18 at 02:12
  • If you step through debugger, does it stop at `std::wcout`? – Wander3r Jul 15 '18 at 02:16
  • @SaileshD Yes. I'll update my question with the disassembly code. – user36278 Jul 15 '18 at 02:19
  • 1
    Perhaps you should post this at https://codereview.stackexchange.com/ and get tips like *not* passing strings by value, that `if (lValueName == TEXT(""))` can be written as `if (lValueName.empty())`, and that having a temp list and doing `lSubKeyList.push_front(lTemp.back());` could likely be done with a single list and a `std::reverse` call. – Bo Persson Jul 15 '18 at 11:34
  • @BoPersson I agree, my code does require some tuning up. I had & still plan on posting to code review once I have completed it. But I appreciate the suggestions. – user36278 Jul 16 '18 at 07:00

2 Answers2

0

Output to a console window is slow, because there's an awful lot of scrolling to do (both in the text buffer and on the screen). You can redirect the program output to a file and the performance should be similar to the version without the output.

To further reduce the time required to run, though, you need to reduce as much as possible the use of strings. Rather than using a series of + string operations (with 3 temporary string objects), create an empty string, reserve enough space in it for the entire constructed string, then use += to add the individual components. Make it a static variable to avoid the extra memory allocations, and only resize it when you need more space.

1201ProgramAlarm
  • 32,384
  • 7
  • 42
  • 56
  • The issue I have is that the windows version of REG.exe completes the task in 13.45 minutes whereas my version takes in excess of 20minutes. – user36278 Jul 15 '18 at 02:16
  • The issue I have is that when I comment out std::wcout << fPath.c_str() the program completes in under 2 minutes as opposed to the uncommented speed of over 20minutes. So I was just wondering how would reducing the amount strings used improve the performance? – user36278 Jul 15 '18 at 02:36
  • 2
    @user36278, the CRT writes one character at a time to the console. If `stdout` is in ANSI C mode, it encodes the character to the console codepage and calls `WriteFile`. If `stdout` is in UTF-16 text mode, it calls `WriteConsoleW`. Either way, this is the opposite of what you'd want for performance. At the very least, write a line at a time, but preferably batch several lines. Prior to Windows 8 a console write can safely be up to about 32 KiB, based on free memory in the shared LPC heap. In Windows 8+ there's no limit since it uses a device driver instead of an LPC port for console I/O. – Eryk Sun Jul 15 '18 at 05:20
0

Im answering my own question

I have found that replacing std::wcout << fPath.c_str(); with fwprintf(stdout,L"%s",fPath.c_str()); reduced the console ouput time to just under 3.45 minutes.

However, I'm not too sure as to the reason why it performs so well other than it performs fewer calls than cout when viewing the assembly code. But this reason is a bit finicky.

Looking at the Performance Profiler results, using fwprintf the CPU(ms) category (time the cpu spent executing the code) is under 50ms whereas wcout is over 200ms.

I believe fwprintf is output buffering the contents of fPath before displaying the content to console whereas wcout does not appear to be doing that.

I have found a few stack-overflow resources which have helped me somewhat understand the problem:

  1. 'printf' vs. 'cout' in C++
  2. What is it with printf() sending output to buffer?
user36278
  • 230
  • 1
  • 10
  • 1
    `fwprintf` to `stdout` is otherwise known as `wprintf`. It happens that in the default ANSI C locale, `wprintf` encodes and prints an entire line to the console codepage (full buffering is not used for the console), whereas `std::wcout` prints one character at a time. – Eryk Sun Jul 16 '18 at 09:33
  • 1
    Anyway, using the console codepage is wrong. Registry keys and string values are Unicode text, so you should be setting `stdout` to UTF-16 mode. In this case `wprintf` will also call `WriteConsoleW` one character at a time, so it is no better. You are going to have to bypass the C/C++ runtime library stdio and call `WriteConsoleW` directly to properly buffer Unicode output that's not limited to about 250 codepage characters. This is what reg.exe does in its `ShowMessage` output function. – Eryk Sun Jul 16 '18 at 09:34
  • Thanks for the advice. When I enabled Unicode mode the program became very slow. This comment explains the reason as to why that was the case. As for your other suggestion. I had implemented WriteConsole. However, the program took more than >30 minutes. So I guess I will stick with printf and refrain from displaying unicode characters. – user36278 Jul 16 '18 at 20:43
  • 1
    You did something deeply wrong if using `WrtiteConsoleW` instead of C/C++ stdio made the program *slower*. Ultimately your program will have to call `WriteConsoleW`, `WriteConsoleA`, or `WriteFile` when using stdio. Bypassing stdio is just giving you fine-grained control over what your program does, and in this case it's necessary to get proper, performant Unicode support that doesn't mangle strings that can't be mapped in a legacy codepage. – Eryk Sun Jul 16 '18 at 23:11
  • 1
    Again, it's what reg.exe actually does, and this is the program you're trying to recreate. Set a breakpoint on `reg!ShowMessage`, and step through it. – Eryk Sun Jul 16 '18 at 23:18
  • @eryksun You were correct. I accidentally tested WriteConsoleW in debug mode rather than release. Side Note: If you submit the above content as an answer I can accept it. – user36278 Jul 16 '18 at 23:22