-1

I want to achieve a very very basic task in Delphi: to save a string to disk and load it back. It seems trivial but I had problems doing this TWICE since I upgraded to IOUtils (and one more time before that... this is why I took the 'brilliant' decision to upgrade to IOUtils).

I use something like this:

procedure WriteToFile(CONST FileName: string; CONST uString: string; CONST WriteOp: WriteOperation);    
begin
   if WriteOp= (woOverwrite)
   then IOUtils.TFile.WriteAllText (FileName, uString)  //overwrite
   else IOUtils.TFile.AppendAllText(FileName, uString); //append
end;    

Simple right? What could go wrong? Well, I recently stepped into a (another) bug in IOUtils. So, TFile is buggy. The bug is detailed here.

Anyone has can share an alternative (or simply your thoughts/ideas) that is not based on IOUtils and it is known to work? Well... the code above also worked for a while for me... So, I know if difficult to guaranty that a piece of code (no matter how small) will really work!

Also I would REALLY like to have my WriteToFile procedure to save the string to an ANSI file when it is possible (the uString contains only ANSI chars) and as Unicode otherwise.
Then the ReadAFile function should automagically detect the encoding and correctly read the string back.
The idea is that there are still text editors out there that will wrongly open/interpret an Unicode/UTF file. So, whenever possible, give a good old ANSI text file to the user.

So:
- Overwrite/Append
- Save as ANSI when possible
- Memory efficient (don't eat 4GB of ram when the file to load is 2GB)
- Should work with any text file (up to 2GB, obviously)
- No IOUtils (too buggy to be of use)

Community
  • 1
  • 1
Gabriel
  • 20,797
  • 27
  • 159
  • 293
  • some more strange things about IOUtils: http://stackoverflow.com/questions/35429699/system-ioutils-tdirectory-getparent-odd-behavior http://stackoverflow.com/questions/31427260/how-to-handle-very-long-file-names-with-tpath – Gabriel Feb 29 '16 at 21:37
  • @MartynA- Exactly! :) :) I previously used TStringList. But then I got the 'great' idea to switch to IOUtils which was more 'dedicated'. Well, there were some issues with TStringList also. IOUtils seems so buggy since the massively upgraded it (I think in Delphi XE). – Gabriel Feb 29 '16 at 21:41
  • @MartynA-I think I will go into the TStringStream. – Gabriel Feb 29 '16 at 21:47
  • @MartynA- question clarified. – Gabriel Feb 29 '16 at 21:52
  • 2
    TStringStream - is not good idea, because it does not recognize source file encoding, so you can load wrong char sequence (default encoding = Unicode). And it does not change encoding for write, so you can get same exception as in another question, if you direct set encoding to ANSI for correct reading – kami Feb 29 '16 at 22:04
  • 1
    @kami: Horses for courses. – MartynA Feb 29 '16 at 22:09
  • Seriously, how long do you think Delphi would have been around if it were incapable of reliably and readably wring a string to disk. Though on recent showing, never underestimate EMBA's capacity for regression. – MartynA Feb 29 '16 at 22:14
  • I wasn't denying anything. See "Though ..." – MartynA Feb 29 '16 at 22:20
  • Sorry. I misinterpreted that because you put two contradictory afitmation togehter :) You said "how long do you think Delphi would have been around if it were incapable of reliably and readably wring a string to disk". If one is using TFile (and probably I am not the only one using it), then Delphi is incapable to RELIABLY reading/writing strings to disk. – Gabriel Feb 29 '16 at 22:23
  • @Kenny what max size of files? Depending on this decision may differ – kami Feb 29 '16 at 22:24
  • @kami - I would say 'normal' text files. Usually WAY WAY under 5MB. But you never know when you have 20MB. – Gabriel Feb 29 '16 at 22:25
  • 1
    `TEncoding.UTF8.GetBytes` and `TEncoding.UTF8.GetString`. FWIW, `AppendAllText` would work if your users would not corrupt the file at hand. – David Heffernan Feb 29 '16 at 22:26
  • "if your users would not corrupt the file..." - Yes. Obviously :) Thanks David. – Gabriel Feb 29 '16 at 22:28
  • 1
    But that's important. If you need to deal with appending to arbitrary files, then the solution will be different. Anyway, "we all love ANSI". No. We all hate it. There is no single ANSI. Just loads of different code pages. That are rather useless. UTF-8 is what we love. – David Heffernan Feb 29 '16 at 22:31
  • I don't understand. Never mind. – David Heffernan Feb 29 '16 at 22:34
  • @DavidHeffernan-I really need to be able to also append! This is why my WriteToFile uses both WriteAllText and AppendAllText! – Gabriel Feb 29 '16 at 22:37
  • So, seek to the end of the file, and write UTF-8 bytes. If the user re-encodes the file, that's on them. I think I've said that a few times now. – David Heffernan Feb 29 '16 at 22:37
  • @Kenny regarding the last edit of the question: please, explain what you mean under "ReadAFile function should automagically detect the encoding and correctly read the string back.". If file size more than 1 Gb (in real application) you cant load whole content to output string - you always get EOutOfMemory exception. I asked about the maximum file size and you say - 20Mb max. The last edit is very different from the original question. – kami Mar 01 '16 at 06:04
  • @Kenny and "Save as ANSI when possible" is not good idea. For example, if i create ANSI file in 1251 codepage (russian), where bytes > 127 contain russian letters and you'll try open it in computer with default codepage 1141 (IBM EBCDIC Germany) you cant see russian characters, you'll get german abracadabra :) – kami Mar 01 '16 at 06:17
  • Because it's not possible to detect a file's encoding, what you are asking for is impossible. Indeed, it was trying to perform this impossible task that led the RTL devs to their defective AppendAllText implementation. – David Heffernan Mar 01 '16 at 07:17

4 Answers4

5

Then the ReadAFile function should automagically detect the encoding and correctly read the string back.

This is not possible. There exists files that are well-formed if interpreted as any text encoding. For instance see The Notepad file encoding problem, redux.

This means that your goals are unattainable and that you need to change them.

My advice is to do the following:

  • Pick a single encoding, UTF-8, and stick to it.
  • If the file does not exists, create it and write UTF-8 bytes to it.
  • If the file exists, open it, seek to the end, and append UTF-8 bytes.

A text editor that does not understand UTF-8 is not worth supporting. If you feel inclined, include a UTF-8 BOM when you create the file. Use TEncoding.UTF8.GetBytes and TEncoding.UTF8.GetString to encode and decode.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
4

Just use TStringList, until size of file < ~50-100Mb (it depends on CPU speed):

procedure ReadTextFromFile(const AFileName: string; SL: TStringList);
begin
  SL.Clear;
  SL.DefaultEncoding:=TEncoding.ANSI; // we know, that old files has this encoding
  SL.LoadFromFile(AFileName, nil); // let TStringList detect real encoding.
  // if not - it just use DefaultEncoding.
end;

procedure WriteTextToFile(const AFileName: string; const TextToWrite: string);
var
  SL: TStringList;
begin
  SL:=TStringList.Create;
  try
    ReadTextFromFile(AFileName, SL); // read all file with encoding detection
    SL.Add(TextToWrite);
    SL.SaveToFile(AFileName, TEncoding.UTF8); // write file with new encoding.
    // DO NOT SET SL.WriteBOM to False!!!
  finally
    SL.Free;
  end;
end;
kami
  • 1,438
  • 1
  • 16
  • 23
  • Thanks kami. So, where is the drawback? What causes the code to be slow after 100MB? – Gabriel Feb 29 '16 at 22:38
  • Also, if I don't want to give a Tstringlist as a parameter to ReadTextFromFile, but a string, it will increase mem consumption? – Gabriel Feb 29 '16 at 22:40
  • Pretty wasteful to read the entire file just to be able to append to it. – David Heffernan Feb 29 '16 at 22:43
  • I keep the upvote anyway. Nice solution for SMALL text files. – Gabriel Feb 29 '16 at 22:45
  • @Kenny String as parameter for Read function - you loose Encoding auto detection. – kami Feb 29 '16 at 22:47
  • @DavidHeffernan, no. If you add UTF8 international characters to end of ANSI file - you get ANSI characters, what would be wrong – kami Feb 29 '16 at 22:49
  • David: "Pretty wasteful" - OH YES!!!!!!!! I remember now!!! This is why I 'upgraded' from TStringList to TFile!!! It was terribly slow for large files. I had to read some large files containing binary data encoded as ascii text; something like a rudimentary mime (a format used in biology currently). TStringList totally choke on that task. I had to go all binary + ansistring to efficiently read those files. – Gabriel Feb 29 '16 at 22:51
  • @kami I'm not suggesting that. I'm suggesting appending to UTF-8 file. – David Heffernan Feb 29 '16 at 22:52
  • @DavidHeffernan but source file - ANSI. So we can't just append to file, we must re-encode it. – kami Feb 29 '16 at 22:59
  • @kami-I don't know much about the format of the UTF8 file. So, forgime my question if it is wrong: can't we just add the BOM in the from of the existing ANSI text? The text in the file is anyway 8 bit. – Gabriel Feb 29 '16 at 23:20
  • @Kenny no, you can't, if file contains any symbol except english, number or punctuation. I know my English is terrible, but I'll try to explain. Any national symbols in ANSI file translated to **one** byte depending on default codepage on local computer. In UTF8 - national symbols contain byte(two, three...) prefix. So, if you have ANSI symbol with code > 127, it can be recognized as prefix from UTF8 symbol. So, you can get an EEncoding error already in the new file. – kami Mar 01 '16 at 05:53
  • @kami what is being asked for is impossible though. You can't detect encoding from content. So the sane way is to pick an encoding and stick to it. – David Heffernan Mar 01 '16 at 06:42
  • @DavidHeffernan yes, we can't if file does not contains preamble. But this does not mean that we can just append UTF8 content to end of ANSI file and then read file as UTF8. For add national characters, that does not accepted by source encoding, we must re-encode whole file content. – kami Mar 01 '16 at 07:16
  • @kami No, of course not. And I'd never suggest doing so. But you can't tell how a file was encoded. So again, the only sane approach is to pick an encoding and stick to it. – David Heffernan Mar 01 '16 at 07:24
1

The Inifiles unit should support unicode. At least according to this answer: How do I read a UTF8 encoded INI file?

Inifiles are quite commonly used to store strings, integers, booleans and even stringlists.

    procedure TConfig.ReadValues();
    var
        appINI: TIniFile;
    begin
        appINI := TIniFile.Create(ChangeFileExt(Application.ExeName,'.ini'));

        try
            FMainScreen_Top := appINI.ReadInteger('Options', 'MainScreen_Top', -1);
            FMainScreen_Left := appINI.ReadInteger('Options', 'MainScreen_Left', -1);
            FUserName := appINI.ReadString('Login', 'UserName', '');
            FDevMode := appINI.ReadBool('Globals', 'DevMode', False);
        finally
            appINI.Free;
        end;
    end;

    procedure TConfig.WriteValues(OnlyWriteAnalyzer: Boolean);
    var
        appINI: TIniFile;
    begin
        appINI := TIniFile.Create(ChangeFileExt(Application.ExeName,'.ini'));

        try
            appINI.WriteInteger('Options', 'MainScreen_Top', FMainScreen_Top);
            appINI.WriteInteger('Options', 'MainScreen_Left', FMainScreen_Left);
            appINI.WriteString('Login', 'UserName', FUserName);
            appINI.WriteBool('Globals', 'DevMode', FDevMode);
        finally
            appINI.Free;
        end;
    end;

Also see the embarcadero documentation on inifiles: http://docwiki.embarcadero.com/Libraries/Seattle/en/System.IniFiles.TIniFile

Community
  • 1
  • 1
T.S
  • 355
  • 4
  • 18
  • @Thomas Why do you think INI files are relevant? – David Heffernan Mar 01 '16 at 07:08
  • @thomas-This is not by far the optimal solution :) – Gabriel Mar 01 '16 at 09:02
  • To be fair, he edited the question several times. It's "a" solution, even if not optimal yet. I'd suggest TTextFile, if I could be sure it supported utf, and he hadn't mentioned TFile being buggy. (which is what TTextFile derives from iirc) – T.S Mar 01 '16 at 09:29
  • 2
    @Thomas There is no `TTextFile`. It might be a solution, but it's not a solution to the problem presented here. None of the edits mention INI files. Why are you bringing INI files into this matter? – David Heffernan Mar 01 '16 at 09:32
  • 1
    What is iirc? How it relates to the question? – Gabriel Mar 12 '16 at 11:26
  • @Thomas-Also, why do you write integers and booleans? Are they related to string in any way? – Gabriel Mar 19 '16 at 14:40
1

Code based on David's suggestions:

{--------------------------------------------------------------------------------------------------
 READ/WRITE UNICODE
--------------------------------------------------------------------------------------------------}

procedure WriteToFile(CONST FileName: string; CONST aString: String; CONST WriteOp: WriteOperation= woOverwrite; WritePreamble: Boolean= FALSE); { Write Unicode strings to a UTF8 file. It can also write a preamble }
VAR
   Stream: TFileStream;
   Preamble: TBytes;
   sUTF8: RawByteString;
   aMode: Integer;
begin
 ForceDirectories(ExtractFilePath(FileName));

 if (WriteOp= woAppend) AND FileExists(FileName)
 then aMode := fmOpenReadWrite
 else aMode := fmCreate;

 Stream := TFileStream.Create(filename, aMode, fmShareDenyWrite);   { Allow read during our writes }
 TRY
  sUTF8 := Utf8Encode(aString);                                     { UTF16 to UTF8 encoding conversion. It will convert UnicodeString to WideString }

  if (aMode = fmCreate) AND WritePreamble then
   begin
    preamble := TEncoding.UTF8.GetPreamble;
    Stream.WriteBuffer( PAnsiChar(preamble)^, Length(preamble));
   end;

  if aMode = fmOpenReadWrite
  then Stream.Position:= Stream.Size;                               { Go to the end }

  Stream.WriteBuffer( PAnsiChar(sUTF8)^, Length(sUTF8) );
 FINALLY
   FreeAndNil(Stream);
 END;
end;


procedure WriteToFile (CONST FileName: string; CONST aString: AnsiString; CONST WriteOp: WriteOperation);
begin
 WriteToFile(FileName, String(aString), WriteOp, FALSE);
end;


function ReadFile(CONST FileName: string): String;  {Tries to autodetermine the file type (ANSI, UTF8, UTF16, etc). Works with UNC paths }
begin
 Result:= System.IOUtils.TFile.ReadAllText(FileName);
end;
Gabriel
  • 20,797
  • 27
  • 159
  • 293
  • the code quantity is a bit ridiculous (if you ask my opinion) if you think that all it does is to write a string to disk. – Gabriel Sep 16 '19 at 08:36