2

I tried using CRichEditCtrl::GetLine() to retrieve the text of a given line of a rich-edit control in an MFC application built with VS2015 in Unicode mode, and running on Windows 10.

I wrote this helper function:

CString GetLine(CRichEditCtrl& richEdit, const int lineNum)
{
    int lineLength = richEdit.LineLength(richEdit.LineIndex(lineNum));
    if (lineLength == 0)
    {
        // Empty line
        return CString();
    }

    const int kMinBufferLength = sizeof(int) / sizeof(wchar_t);
    const int bufferLength = max(kMinBufferLength, lineLength);

    CString line;
    wchar_t* buffer = line.GetBuffer(bufferLength);   
    lineLength = richEdit.GetLine(lineNum, buffer, bufferLength);      
    line.ReleaseBuffer(lineLength);

    return line;
}

This code works fine, except for lines containing only one character. In this case, CRichEditCtrl::GetLine() returns 2 (instead of the expected 1), and the output buffer contains the correct character, followed by a \r.

Why is that? Why is the \r added only for single-character lines and not for lines containing more characters?

I was able to fix that adding a special case if like this:

// Code inserted after the richEdit.GetLine() call, before the line.ReleaseBuffer() call:    

// *** Special Case ***
// It seems that when there's only one character (e.g. 'C') in the line,
// CRichEditCtrl::GetLine() returns 2, and appends a '\r' after 
// the read character in the output buffer.
if ((lineLength == 2) && (buffer[1] == L'\r'))
{
    // Chop off the spurious '\r'
    lineLength = 1;
}

However, it's not clear to me the reason for this special-case behavior.


P.S: The CRichEditCtrl::GetLine() MFC code that is invoked is:

int CRichEditCtrl::GetLine(_In_ int nIndex, _Out_writes_to_(nMaxLength, return) LPTSTR lpszBuffer, _In_ int nMaxLength) const
{
    ASSERT(::IsWindow(m_hWnd));
    ENSURE(sizeof(nMaxLength)<=nMaxLength*sizeof(TCHAR)&&nMaxLength>0);
    *(LPINT)lpszBuffer = nMaxLength;
    return (int)::SendMessage(m_hWnd, EM_GETLINE, nIndex, (LPARAM)lpszBuffer);
}

So this seems just a tiny wrapper around the EM_GETLINE message.

The MSDN doc for EM_GETLINE states that "the return value is the number of TCHARs copied" (in my case, the wchar_ts). For one-character lines the return value is two, instead of the expected one. So, sounds like the rich-edit control is actually returning the single character followed by a spurious \r in this special case.

For lines containing more than one characters, the returned value is the actual number of characters, as expected (I tried with simple English/ASCII characters, to avoid complications of Unicode surrogate pairs and other stuff).

Mr.C64
  • 41,637
  • 14
  • 86
  • 162
  • 1
    This is not unusual, RTF is old and wonky and one of only few places I know where `\n` is the line terminator. These `\r` characters may actually appear in the RTF, written by whatever program generated the RTF to keep the line lengths reasonable. As recommended by Microsoft. Beware of `\r\r\n`, it has been done. The filtering that the .NET RichTextBox does may be helpful to assist or convince you that you are doing it right. https://referencesource.microsoft.com/#System.Windows.Forms/winforms/Managed/System/WinForms/TextBoxBase.cs,37cabfde1449b18f – Hans Passant Sep 22 '17 at 20:56
  • @HansPassant Thanks for the link. To be clear, I was able to reproduce it consistently, creating a simple MFC dialog app with a rich edit control and typing one-character lines in it and retrieving them with the code shown above. – Mr.C64 Sep 22 '17 at 21:11
  • Which version of the richedit control are you using? – zett42 Sep 22 '17 at 21:14
  • @zett42 I just drag-and-dropped the rich edit control from the VS control palette to an MFC dialog box. I haven't specified any version explicitly. – Mr.C64 Sep 22 '17 at 21:49
  • I get very strange results using your function. The text `A\r\nline\r\nand another one` gets split into these lines: 1) `a\r` 2) `li` 3) `and `. It doesn't make a difference if I replace `\r\n` by just `\n`. – zett42 Sep 22 '17 at 22:56
  • Got it to work now by changing this line: `int lineLength = richEdit.LineLength(richEdit.LineIndex(lineNum));` as `LineLength()` expects a character index. Now I get the result you describe, that is an added `\r` for single-character lines. – zett42 Sep 22 '17 at 23:04
  • @zett42: Thanks, I fixed the code (I tried with first lines and this is the reason why it probably worked). Thanks for confirming the behavior. – Mr.C64 Sep 22 '17 at 23:13

2 Answers2

2

I got it to work without special-casing by using the other overload of CRichEditCtrl::GetLine():

*(int*) buffer = lineLength;
lineLength = richEdit.GetLine(lineNum, buffer);

The reference for EM_GETLINE says that you have to write the size of the buffer into the buffer, while this actually is the number of characters you request.

The reference for the macro Edit_GetLine() which sends EM_GETLINE has it correct:

cchMax The maximum number of characters to be copied to the buffer.

The macro writes the cchMax parameter to the buffer before calling SendMessage() which is exactly the same as my code above.

I also think that the condition in the 3-parameter overload of CRichEditCtrl::GetLine() which causes an exception if you request less than 2 characters, is incorrect.

zett42
  • 25,437
  • 3
  • 35
  • 72
  • So, I specified a minimum size of 2 `wchar_t`s (even for single-character lines), to satisfy the `ENSURE` condition in the MFC wrapper code, and it sounds like the rich edit control padded the second `wchar_t` slot with `\r`? To me, even if I specified a larger buffer, the rich edit `EM_GETLINE` should _return_ the _actual_ count of `wchar_t`s in the line, not the _maximum_ buffer size I provided as input parameter (_"`cchMax` The maximum number of characters to be copied to the buffer."_). – Mr.C64 Sep 23 '17 at 08:48
  • The [EM_GETLINE](https://msdn.microsoft.com/library/windows/desktop/bb761584) documentation does say: *"the size, in TCHARs, of the buffer"*. There is nothing wrong here. – IInspectable Sep 23 '17 at 09:16
  • @IInspectable Apparently the documentation of `EM_GETLINE` is incorrect. You have to pass the number of characters, which can be less than the minimum size of the buffer (4 bytes or 2 wchar_t characters). – zett42 Sep 23 '17 at 09:46
  • I don't understand, what specifically is wrong about the `EM_GETLINE` documentation, and I don't see, how your last comment clarifies that either. – IInspectable Sep 23 '17 at 10:03
  • 1
    @IInspectable According to the documentation, if the line length is `1`, you would have to pass `2` to `EM_GETLINE` because that is the minimum size of the buffer (assuming that "word" means a 32-bit integer in this context and compiling for Unicode). But actually you have to pass the line length of `1` to prevent it from returning the spurious `\r`. – zett42 Sep 23 '17 at 10:42
  • The documentation left out the obvious part, i.e. *"the size, in TCHARs, of the buffer **to use**"*. I don't see how adding the obvious would help any. There is no requirement ever for a buffer to be **exactly** the size you tell a function to use. It can be larger. I still don't see, where you believe the documentation to be wrong. – IInspectable Sep 23 '17 at 15:18
  • @IInspectable The "to use" addition is irrelevant. If you pass a buffer size of 2 and the line length is 1, EM_GETLINE returns 2 and puts a \r after the returned character, instead a reasonable return value should be 1 (only one character in the line). – Mr.C64 Sep 23 '17 at 17:11
  • In addition, I also find confusing in the documentation the use of "word" to indicate the buffer length: is this a 16-bit unsigned WORD, or a 32-bit unsigned DWORD, or something else? According to the MFC wrapper code this sounds actually an int, but I think the MSDN documentation should be clearer on that. – Mr.C64 Sep 23 '17 at 17:25
  • @Mr.C64: The actual buffer size is irrelevant. If you tell the message handler, that you want 2 characters for a line with only 1 character, it should come without surprise, that the implementation will copy 2 characters, if it can. If you only want 1 character, tell the API that you only want 1 character. I don't get all the noise you are making about an API following the route of least surprise. – IInspectable Sep 24 '17 at 06:44
  • 1
    @IInspectable It could be argued that `\r` is not part of the line, it is a *delimiter* which shouldn't be returned if one requests a *line*. Anyway, the documentation is unclear in that regard. For an example of a precise documentation, have a look at [`std::basic_istream::getline`](http://en.cppreference.com/w/cpp/io/basic_istream/getline) which makes it very clear that the delimiter is not stored, regardless how many characters you request. – zett42 Sep 24 '17 at 09:27
  • Why did we change the subject? – IInspectable Sep 24 '17 at 09:44
  • @IInspectable I don't get all the noise you are making in your comments for an API that is at best poorly documented and an MFC wrapper code that asserts and probably shouldn't. – Mr.C64 Sep 25 '17 at 21:10
-2

The return value is zero (0) if the line is not valid.

If the line is empty it makes sense to return 1 and '\r' in the buffer. That would mean that '\r' is always returned when the line number is valid.

The function reference says that the buffer should be at least 4 bytes long, because a WORD is written to the buffer before being passed to SendMessage.

sizeof(nMaxLength) in the ENSURE function is the size of an int or WORD.

CRichEditCtrl::GetLine

CRichEditCtrl::GetLineCount has some code.

Baxter
  • 126
  • 4
  • I'm discussing the case of single-character lines, not empty lines. Please read my question carefully. Moreover, FWIW, `sizeof(WORD)` is 2, not 4. Anyway, it's not clear from the MSDN doc if the size should be stored in a 2-byte "word" or 4-byte "word"; considering the MFC wrapper code, it sounds like 4 bytes (`int`). – Mr.C64 Sep 23 '17 at 08:27
  • @Mr.C64 The `Edit_GetLine()` macro also uses an `int`, so at least in this regard the MFC wrapper seems to be correct. – zett42 Sep 23 '17 at 09:48