-1

I have big big data in form of bytes around 5GB.

I need to store this data in a file ServerData.xml. This data should be first converted into string and then should be saved to file so that we can perform operation on the file.

I used below code to convert stream of bytes to string and then to save the same in a file.

private const string fileName = "ServerData.xml";

public void ProcessBuffer(byte[] receiveBuffer, int bytes)
{
    if (!File.Exists(fileName))
    {
        using (File.Create(fileName)) { };
    }

    TextWriter tw = new StreamWriter(fileName, true);
    tw.Write(Encoding.UTF8.GetString(receiveBuffer).TrimEnd((Char)0));
    tw.Close();
}

Is it the right way ?

or please suggest better way so that there should not be any memory issue if any in future ?

Yeldar Kurmangaliyev
  • 33,467
  • 12
  • 59
  • 101
Gaurav123
  • 5,059
  • 6
  • 51
  • 81
  • 3
    For me, creating a 5GB string sounds wrong. – Yeldar Kurmangaliyev Oct 20 '15 at 05:24
  • I understand, but we have no choice. we are getting this data from third party so we have to process it :( – Gaurav123 Oct 20 '15 at 05:26
  • 2
    As per your [prior question](http://stackoverflow.com/questions/33140943/how-to-avoid-memory-exception-when-reading-creating-and-sending-a-very-large-xm), you shouldn't be converting the bytes to a `string` just to save it. As Alexei Levenkov said: _"[What the point of converting byte array to string (2x memory size) when you can just read it directly as stream?](http://stackoverflow.com/questions/33140943/how-to-avoid-memory-exception-when-reading-creating-and-sending-a-very-large-xm)"_. You seem to be ignoring the good advice from others. –  Oct 20 '15 at 05:26
  • Why convert it to string. Just write bytes in file. It is fast and will consume less space, – fhnaseer Oct 20 '15 at 05:27
  • Maybe use a `FileStream`? – Sweeper Oct 20 '15 at 05:29
  • 1
    If I save bytes into a file , then how can I read it ? It contains xml data. Please suggest – Gaurav123 Oct 20 '15 at 05:30
  • If you wanted to just write binaries, as your title suggests, you would use `FileStream` rather than `StreamWriter`, which is for text. But it looks like you're writing XML, which is text (binaries are encoded to text with `GetString()`), so `TextWriter` and `StreamWriter` are fine. You may want to use a loop to write chunks from the `receiveBuffer` array rather than writing everything at once. Also, if the file already exists, `File.Create()` needs to be handled. Try/Catch should always be used when using I/O.. – Victor Stoddard Oct 20 '15 at 05:33
  • File.Create is under ``if`` condition, so it will work fine. – Gaurav123 Oct 20 '15 at 05:35
  • @Micky : If I save bytes into a file , then how can I read it ? It contains xml data. I have to use this xml again and again. Please suggest – Gaurav123 Oct 20 '15 at 05:37
  • @Gaurav123 You should consider `XmlWriter` and `XmlReader`. They will not only store binaries along with text, but will encode and decode binaries as needed. If not, then you need to encode binaries yourself since XML is text only. Encoding will increase the file size, so consider storing/sending binary files separately. – Victor Stoddard Oct 20 '15 at 05:39
  • Wasn't maximum object size pre .Net 4.5 was 2GB? – danish Oct 20 '15 at 05:41
  • @Gaurav123 It is not necessary to re-encode the bytes stream just to save it - they're already encoded. You won't have a problem reading it back as a text/xml file later. –  Oct 20 '15 at 05:41
  • @DavidHeffernan : we are getting the data from Socket i.e. in stream of bytes. So ``ProcessBuffer`` method calls many times. and I am not writing the who data into file at once, it keep on continue till the socket sends streaming of bytes. I am not loading XML in memory instead I will use XmlReader for that purpose – Gaurav123 Oct 20 '15 at 06:11

4 Answers4

1

The code in your question can only work if ProcessBuffer is always called with a UTF-8 encoded text that is broken on code point boundaries. That seems pretty unlikely to me, so I would expect that you encounter errors when decoding to text.

However, decoding to text and then writing, is rather pointless and indeed counter-productive. The bytes are already UTF-8 encoded. Write them directly to file as they arrive from the socket. Don't perform any processing of them. When you come to read the XML using XmlReader, the parser will read the encoding as UTF-8 from the document's XML declaration, and be able to decode the rest of the document. I am assuming that the document's XML declaration specifies UTF-8 but that seems highly likely. You should check.

You should get rid of the text writer which is no use to you for writing bytes. Write the bytes directly to a file stream. And try to avoid opening and closing the file repeatedly. That's very inefficient. Open and close the file exactly once.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • You mean to say that I can directly write bytes into file using ``tw.Write(receiveBuffer);`` ? Or something else ? and When I will use ``XmlReader`` it will automatically read it correctly? – Gaurav123 Oct 20 '15 at 06:35
  • Yes, that's exactly right, because the bytes are already text encoded as UTF-8, which I am assuming matches the document's XML declaration. Note that everybody else here has said the exact same thing as me. – David Heffernan Oct 20 '15 at 06:37
  • using ``tw.Write(receiveBuffer);`` I get this ``System.Byte[]System.Byte[]System.Byte[]System.Byte[]System.Byte[]System.Byte[]System.Byte[]System.Byte[]System.Byte[]System.Byte[]System.Byte[]`` – Gaurav123 Oct 20 '15 at 06:47
  • 1
    Of course you do. You are using a text write. Just open a file stream, seek to the end, and write the bytes. Don't ignore the byte count parameter either. Note that it is inefficient to keep opening and closing the file. Do that exactly once. – David Heffernan Oct 20 '15 at 06:50
  • ``using (var stream = new FileStream(fileName, FileMode.Append)) { stream.Write(receiveBuffer, 0, receiveBuffer.Length); }`` – Gaurav123 Oct 20 '15 at 06:56
  • I am getting two problems. 1. a symbol in the starting of file . 2. file is little distorted, means that starting and end of tag is there but still some data is coming after the end tag too – Gaurav123 Oct 20 '15 at 07:00
  • What is the `bytes` parameter of `ProcessBuffer`. Surely you can't ignore that. – David Heffernan Oct 20 '15 at 07:00
  • Now, I really don't want to try to debug you program for you. I can't see much of it, and I've given you a high level answer to the question you asked. I'm sure you can work out the details. – David Heffernan Oct 20 '15 at 07:03
  • Yup, Thanks a lot @David Heffernan :) You have given a great support – Gaurav123 Oct 20 '15 at 07:04
0

You can simply write these bytes to a file using FileStream:

public void ProcessBuffer(byte[] receivedBuffer, int bytes)
{
    using (var fileStream = new FileStream(fileName, FileMode.Create)) // overwrites file
    {
        fileStream.Write(receivedBuffer, 0, bytes);
    }
}

Update: You won't be able to work with such a big XML document if you don't have enough resources. I would suggest reformatting this file. For example, I would parse this XML and insert data into a SQL database. Then, you can easily operate with such amounts of data.

Yeldar Kurmangaliyev
  • 33,467
  • 12
  • 59
  • 101
  • He wants to convert bytes to string and then write strings to file, – fhnaseer Oct 20 '15 at 05:31
  • 1
    @FaisalHafeez So, what is the point of allocating of 10 GB memory? There is no need to convert to string in order to write data to file. – Yeldar Kurmangaliyev Oct 20 '15 at 05:32
  • @FaisalHafeez What OP wants is not _efficient_. –  Oct 20 '15 at 05:33
  • @YeldarKurmangaliyev : If I save bytes into a file , then how can I read it ? It contains xml data. I have to use this xml again and again. Please suggest – Gaurav123 Oct 20 '15 at 05:36
  • @YeldarKurmangaliyev there is no sense but question is that. I suggested that as well (write bytes in file), – fhnaseer Oct 20 '15 at 05:38
  • 1
    @Gaurav123 I am not sure that you will be able to work with such a big XML document as `XmlDocument`. Of course, if you don't have enough resources for it :) You need to reformat this XML. For example, I would parse this XML and insert data into a SQL database. Then, you can easily operate with such amounts of data. – Yeldar Kurmangaliyev Oct 20 '15 at 05:38
0

Why do you need to convert it to a string?

using System.IO;

public static void WriteBytes(byte[] bytes, string filename)
{
    using (FileStream fs = new FileStream(filename, FileMode.OpenOrCreate))
    using (BinaryWriter writer = new BinaryWriter(fs, Encoding.UTF8))
    {
        writer.Write(bytes);
    }
}
jk777
  • 61
  • 3
  • if I save bytes into a file , then how can I read it ? It contains xml data. I have to use this xml again and again – Gaurav123 Oct 20 '15 at 05:38
  • 1
    You would read it like you normally would. The bytes would not be written as strings. – jk777 Oct 20 '15 at 05:43
0

I would prefer that I write all bytes to file. And when reading, convert it to string and then convert to XML using XDocument, XElement etc. By writing bytes in file you will save space, and it is efficient,

Instead of using FileStream, I will prefer File.WriteAllBytes method.

private const string fileName = "ServerData.xml";
public void ProcessBuffer(byte[] receiveBuffer, int bytes)
{
    File.WriteAllBytes(filename, bytes);


    // And when reading
    var bytes = File.ReadAllBytes(filename);
    var binaryReader = new BinaryReader(new MemoryStream(bytes));
    // Parse strings and make xml,
    binaryReader.ReadString();

}
fhnaseer
  • 7,159
  • 16
  • 60
  • 112
  • Why are you reading 5GB into a `byte[]` just so you can wrap it in a `MemoryStream`? That will at least double the memory usage by the time you read it into a `string`. OP ultimately wants an XML so arguably better off with replacing with `XmlReader` or `XDocument`. `string` parsing not necessary. –  Oct 20 '15 at 06:13
  • @Faisal : ``WriteAllBytes`` will create a new file which is not right. I need to append the data in same file – Gaurav123 Oct 20 '15 at 06:41