0

Up until now my INI text files have been Unicode encoded and I have been doing things like this to read it:

::GetPrivateProfileStringW(
                strUpdateSection,
                strTalkKey,
                _T(""),
                strTalk.GetBuffer( _MAX_PATH ),
                _MAX_PATH,
                m_strPathINI );
strTalk.ReleaseBuffer();

The file gets downloaded from the internet first and then accessed. But I was finding that for Arabic the text file was getting corrupted. Unless I change the encoding to UTF-8. When I do that, it downloads right. But then it doesn't read the data right from the INI.

So does a INI file have to be Unicode encoded or should it also be OK to use UTF-8 with the function calls?

I think it is about time I convert this part of my program to UTF-8 encoded XML files instead! Way to go.

But wanted to ask the question first.

I should clarify that the text file was initial saved as UTF-8 using NotePad.

Update

This is what it looks like when I read the file:

Arabic results

This is how I download the file:

BOOL CUpdatePublicTalksDlg::DownloadINI()
{
    CInternetSession    iSession;
    CHttpFile           *pWebFile = NULL;
    CWaitCursor         wait;
    CFile               fileLocal;
    TCHAR               szError[_MAX_PATH];
    int                 iRead = 0, iBytesRead = 0;
    char                szBuffer[4096];
    BOOL                bOK = FALSE;
    DWORD               dwStatusCode;
    CString             strError;

    // ask user to go online
    if( InternetGoOnline( (LPTSTR)(LPCTSTR)m_strURLPathINI, GetSafeHwnd(), 0 ) )
    {
        TRY
        {
            // our session should already be open
            // try to open up internet session to my URL
            // AJT V10.4.0 Use flag INTERNET_FLAG_RELOAD
            pWebFile = (CHttpFile*)iSession.OpenURL( m_strURLPathINI, 1,
                INTERNET_FLAG_TRANSFER_BINARY |
                INTERNET_FLAG_DONT_CACHE | INTERNET_FLAG_RELOAD);

            if(pWebFile != NULL)
            {
                if( pWebFile->QueryInfoStatusCode( dwStatusCode ) )
                {
                    // 20x codes mean success
                    if( (dwStatusCode / 100) == 2 )
                    {
                        // Our downloaded file is only temporary
                        m_strPathINI = theApp.GetTemporaryFilename();

                        if( fileLocal.Open( m_strPathINI,
                            CFile::modeCreate|CFile::modeWrite|CFile::typeBinary ) )
                        {
                            iRead = pWebFile->Read( szBuffer, 4096 );
                            while( iRead > 0 )
                            {   
                                iBytesRead += iRead;
                                fileLocal.Write( szBuffer, iRead );
                                iRead = pWebFile->Read( szBuffer, 4096 );
                            }
                            fileLocal.Close();

                            bOK = TRUE;
                        }
                    }
                    else
                    {
                        // There was a problem!
                        strError.Format( IDS_TPL_INVALID_URL, dwStatusCode );
                        AfxMessageBox( strError,
                                       MB_OK|MB_ICONERROR );
                    }
                }
            }
            else
            {
                AfxMessageBox( ID_STR_UPDATE_CHECK_ERR, MB_OK|MB_ICONERROR );
            }
        }
        CATCH( CException, e )
        {
            e->GetErrorMessage( szError, _MAX_PATH );
            AfxMessageBox( szError, MB_OK|MB_ICONERROR );
        }
        END_CATCH

        // Tidy up
        if( pWebFile != NULL )
        {
            pWebFile->Close();
            delete pWebFile;
        }

        iSession.Close();
    }

    return bOK;
}

This is how I read the file:

int CUpdatePublicTalksDlg::ReadTalkUpdateINI()
{
    int         iLastUpdate = 0, iUpdate;
    int         iNumTalks, iTalk;
    NEW_TALK_S  sTalk;
    CString     strUpdateSection, strTalkKey, strTalk;

    // How many possible updates are there?
    m_iNumUpdates = ::GetPrivateProfileIntW(
                        _T("TalkUpdates"),
                        _T("NumberUpdates"),
                        0,
                        m_strPathINI );

    // What what the last talk update count?
    iLastUpdate = theApp.GetLastTalkUpdateCount();

    // Loop available updates
    for( iUpdate = iLastUpdate + 1; iUpdate <= m_iNumUpdates; iUpdate++ )
    {
        // Build section key
        strUpdateSection.Format( _T("Update%d"), iUpdate );

        // How many talks are there?
        iNumTalks = ::GetPrivateProfileIntW(
                            strUpdateSection,
                            _T("NumberTalks"),
                            0,
                            m_strPathINI );
        // Loop talks
        for( iTalk = 1; iTalk <= iNumTalks; iTalk++ )
        {
            // Build key
            strTalkKey.Format( _T("Talk%d"), iTalk );

            // Get talk information
            ::GetPrivateProfileStringW(
                            strUpdateSection,
                            strTalkKey,
                            _T(""),
                            strTalk.GetBuffer( _MAX_PATH ),
                            _MAX_PATH,
                            m_strPathINI );
            strTalk.ReleaseBuffer();

            // Decode talk information
            DecodeNewTalk( strTalk, sTalk );

            // Does it already exists in the database?
            // AJT v11.2.0 Bug fix - we want *all* new talks to show
            //if( !IsExistingTalk( sTalk.iTalkNumber ) )
            //{
            //if(!LocateExistingTheme(sTalk, false))
            AddNewTalkToListBox( sTalk );
            //}
        }
    }

    // Return the actual amount of updates possible
    return m_iNumUpdates - iLastUpdate;
}

This is the file being downloaded:

http://publictalksoftware.co.uk/TalkUpdate_Arabic2.ini

Update

It seems that the file is corrupt at the point of being downloaded:

Corrupt

Please see updated Watch:

Watch

Making progress, I now confirm the data OK at this point:

Watch results with s8 format specifier

Andrew Truckle
  • 17,769
  • 16
  • 66
  • 164
  • Wow, that function hasn't been recommended since Microsoft invented the registry. I'm not sure whether `GetPrivateProfileStringA` will handle UTF-8 or not, but I'm quite sure you won't be able to use Unicode characters in the filename. – Mark Ransom Jan 26 '17 at 20:00
  • @MarkRansom The `A` versions are for `ANSI` encoded files. I may look into using XML files now. Thanks. – Andrew Truckle Jan 26 '17 at 22:00
  • @AndrewTruckle: how exactly is Arabic text getting corrupted? What is the actual encoding of the INI file when Arabic text is used? Please be more specific. – Remy Lebeau Jan 26 '17 at 22:10
  • 1
    @MarkRansom: The `W` function will handle a UTF-8 encoded INI file, but only if the file has a UTF-8 BOM on it, otherwise the function requires the file to be encoded as UTF-16. The `A` function will read a UTF-8 encoded file without a BOM (but will not handle a Unicode filename that does not use the OS's current Ansi charset), but it won't know the file is using UTF-8 so it will just return the UTF-8 data as-is, you will have to convert it to UTF-16 manually. – Remy Lebeau Jan 26 '17 at 22:11
  • The file was Unicode. Opened in notepad and saved. Uploaded to web by ftp. When visiting link it showed Arabic and extra gibberish. Saved same Unicode as UTF8 with notepad and uploaded. Refresh browser cache. That file views fine. Then use my code which downloads the file and then I have issues. If you think it beneficial then tomorrow I can show the download code in case something there is doing something to the file. But as far as I was aware notepad will always make a good encoded file. – Andrew Truckle Jan 26 '17 at 22:30
  • 1
    *"The file was Unicode."* - Unicode is a standard. That's not interesting. What *is* interesting is the encoding used. I wouldn't know of a Unicode character encoding, that cannot represent Arabic text. – IInspectable Jan 27 '17 at 17:26
  • @IInspectable I have added some more information. Don't know if it will help. One last attempt before moving to XML files. – Andrew Truckle Jan 30 '17 at 16:35
  • 1
    Did you single-step through your code to find out the earliest point, where the string doesn't show the content you expect? Make sure you properly interpret the string contents in your debugger, using appropriate [format specifiers](https://msdn.microsoft.com/en-us/library/75w45ekt.aspx) in the watch window. – IInspectable Jan 30 '17 at 16:50
  • @IInspectable Updated. I wonder, I am using `char` as the binary unit. Is that wrong? – Andrew Truckle Jan 30 '17 at 17:46
  • 2
    Nothing indicates corruption at the point where the file is being downloaded. You are loading it into `szBuffer` which is of type `char[4096]`. The debugger's text visualizer will assume, that this is ANSI encoding (not UTF-8 or UTF-16). I suggested using the *Watch* window with appropriate format specifiers. That's the only way you can be sure that data is interpreted the way you want to interpret it. Your screenshot shows the result of leaving the interpretation up to the debugger visualizer (which gets it wrong). – IInspectable Jan 30 '17 at 17:51
  • 2
    Entering `szBuffer,s8` as the *Watch* expression should interpret the contents as a UTF-8 encoded string. – IInspectable Jan 30 '17 at 18:23
  • I have done that and the problem is I can't see the whole string. I added a screenshot. It is interesting to learn this, but I am think XML might be the way forward as I know I can do it. – Andrew Truckle Jan 30 '17 at 18:28
  • 1
    If `szBuffer,s8` truncates the string before the interesting part, try to add an offset (like `szBuffer+128,s8`). Make sure that you aren't splitting UTF-8 surrogates by doing so. – IInspectable Jan 30 '17 at 20:19
  • @IInspectable Thanks. Question updated. I confirm that it is reading the UTF-8 file down OK. I have intercepted and opened the downloaded file using Notedpad and wordpad. Looks good. In Notepad if I do save-as it indicates it is UTF8 encoded. All good. It is the `GetPrivateProfileStringW` call that seems to mess it up as I originally raised. – Andrew Truckle Jan 30 '17 at 20:32
  • See http://stackoverflow.com/questions/859304/convert-cstring-to-const-char/859841#859841 – sergiol Feb 13 '17 at 00:33

0 Answers0