How to cut data from a file?

Question

I have a file not that big around 140MB, it contains some CAN data in a period of time with a total duration about 29:17:00 [mm:ss:ms]. What I need is to split that file or better to copy some data out for a specific duration of time, into a new file.

Like let we say from time 10:00:00 to 20:30:00

Any ideas? How to approach?

What I did so far to read the header:

private void test(string fileName)
{
    FileStream fs;

    fs = File.OpenRead(fileName);
    long fileSize = fs.Length;
    bool extendedFileFormat = DriveRecFiles.IsFileDRX(replayCtrl.SourceFilename);

    Int64 tmpByte = 0;
    Int64 tmpInt64 = 0;

    #region TimeStampFrequency
    for (int i = 0; i < 8; i++)
    {
        tmpByte = fs.ReadByte();
        tmpInt64 += tmpByte << i * 8;
    }
    SourceTimingClockFrequency = tmpInt64;
    #endregion

    #region  StarTimeStamp                
    tmpInt64 = 0;
    for (int i = 0; i < 8; i++)
    {
        tmpByte = fs.ReadByte();
        tmpInt64 += tmpByte << i * 8;
    }
    sourceTimingBeginStampValue = tmpInt64;
    #endregion

    #region Last TimeStamp
    fs.Position = fs.Length - 8;
    tmpInt64 = 0;
    for (int i = 0; i < 8; i++)
    {
        tmpByte = fs.ReadByte();
        tmpInt64 += tmpByte << i * 8;
    }
    TimeStampEnd = tmpInt64;

    //This is the conversation from TimeStamp to Time in ms   
    int FileLengthTime = (int)((1000 * (TimeStampEnd - sourceTimingBeginStampValue)) / SourceTimingClockFrequency);
    #endregion

}

Now I'm stuck and I don't know how to approach, should I compare through a for loop each timestamp with each like:

let's say I have set a begin time 1000000ms and an end time 1700000ms

int begintime = 1000000
int endtime = 1700000
int startPosition = 0
int endPosition = 0
long currentTimeStepEnd = 0;
int currentTime = 0;
for (int i = 8; i <= fs.Length - 8 ; i++)
{
    fs.position = i
    tmpInt64 = 0;
    for(int i = 0; i < 8; i++)
    {
        tmpByte = fs.ReadByte();
        tmpInt64 += tmpByte << i * 8;
    }
    currentTimeStepEnd = tmpInt64;
    currentTime = (int)((1000 * (CurrentTimeStepEnd - sourceTimingBeginStampValue)) / SourceTimingClockFrequency);
    if(startPosition = 0) int start = currentTime.CompareTo(begintime)
    if(endPosition = 0) int end = currentTime.CompareTo(endtime)
    if (start == 0) startPosition = i;
    if (end == 0) endPosition = i
    if ((startPosition != 0) & (endPosition != 0)) break;
    i += 47;
}

And then copy the result to a file.

I don't know if this is the best approach. Second is that I want to make a slider for start time and a slider for end time with a step of 1ms I think the above method is not efficient to compare each time the new slider value with the current timestamp etc. Each time open and closing the fs?

Can you tell us a bit more about your file? Is it binary (i.e., those 4 byte quantities represent an integer) or string-ish (those 4 byte things look like "1234")? If it's binary, is it little-endian or big-endian? Is the header on a separate line? I'm guessing that there is no between field separator - is that a good guess? Is each data chunk on a separate line? Is there a separator between the header and the chunks, and between the chunks (if so, what)? I have a vague idea of what a CAN network is, but that's about it (and I'm probably way ahead of most folks). — Flydog57, Dec 06 '18 at 17:11
Are the data records fixed in size? You could just read the file in hunks the size of the record and determine if you need to process it. You don't need to read it one byte at a time like that. — Jeff Mercado, Dec 06 '18 at 18:55
the file is binary and yes in little-endian. no there is no separator between header and the chunk — Nizar Belhiba, Dec 06 '18 at 19:27
I have uploaded the file tomy google driver the link is in the topic. — Nizar Belhiba, Dec 06 '18 at 19:45

Flydog57 · Accepted Answer · 2018-12-09T06:34:10.800

Here's part of the answer. I can read in your data, chunk-by-chunk. Once you get it in, then you can decide to write it back out into a set of smaller files (using BinaryWriters on FileStreams). I'll leave that to you. But, this reads everything in.

Update: There's more of the answer below (I added the WriteStruct method, and something closer to what you asked for)

I start by defining two structures with very clear layout. Since the header consists of just two consecutive 64 bit uints, I can just use LayoutKind.Sequential:

[StructLayout(LayoutKind.Sequential)]
public struct CanHeader {
    public UInt64 TimeStampFrequency;
    public UInt64 TimeStamp;
}

But, the Chunk structure mixes and matches 32 and 64 bit uints. If I lay it out sequentially, the framework inserts 4 bytes of padding to align the UInt64s. So, I need to use LayoutKind.Explicit:

[StructLayout(LayoutKind.Explicit)]
public struct CanChunk {
    [FieldOffset(0)] public UInt32 ReturnReadValue;
    [FieldOffset(4)] public UInt32 CanTime;
    [FieldOffset(8)] public UInt32 Can;
    [FieldOffset(12)] public UInt32 Ident;
    [FieldOffset(16)] public UInt32 DataLength;
    [FieldOffset(20)] public UInt64 Data;
    [FieldOffset(28)] public UInt32 Res;
    [FieldOffset(32)] public UInt64 TimeStamp;
}

Then I took a look at @FelixK's answer to C# array within a struct, and modified his ReadStruct extension method to suit my needs:

private static (T, bool) ReadStruct<T>(this BinaryReader reader) where T : struct {
    var len = Marshal.SizeOf(typeof(T));
    Byte[] buffer = reader.ReadBytes(len);

    if (buffer.Length < len) {
        return (default(T), false);
    }
    //otherwise
    GCHandle handle = default(GCHandle);
    try {
        handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
        return ((T)Marshal.PtrToStructure(handle.AddrOfPinnedObject(), typeof(T)), true);
    } finally {
        if (handle.IsAllocated)
            handle.Free();
    }
}

It returns a tuple, where the first member is a structure instance that's just been read from the file, and the second member is a flag to indicate whether more reads are needed (true says "keep reading"). It also uses BinaryReader.ReadBytes, rather than BinaryReader.Read.

With all that in place, now I read the data. My first try had me writing things out to the console - but it takes forever to write out 140 MB. But, if you do that, you will see the data moving the way you'd expect (the time stamp keeps going up).

public static void ReadBinary() {
    using (var stream = new FileStream("Klassifikation_only_Sensor1_01.dr2", FileMode.Open, FileAccess.Read)) {
        using (var reader = new BinaryReader(stream)) {
            var headerTuple = reader.ReadStruct<CanHeader>();
            Console.WriteLine($"[Header] TimeStampFrequency: {headerTuple.Item1.TimeStampFrequency:x016}  TimeStamp: {headerTuple.Item1.TimeStamp:x016}");;
            bool stillWorking;
            UInt64 totalSize = 0L;
            var chunkSize = (UInt64)Marshal.SizeOf(typeof(CanChunk));
            do {
                var chunkTuple = reader.ReadStruct<CanChunk>();
                stillWorking = chunkTuple.Item2;
                if (stillWorking) {
                    var chunk = chunkTuple.Item1;
                    //Console.WriteLine($"{chunk.ReturnReadValue:x08} {chunk.CanTime:x08} {chunk.Can:x08} {chunk.Ident:x08} {chunk.DataLength:x08} {chunk.Data:x016} {chunk.Res:x04} {chunk.TimeStamp:x016}");
                    totalSize += chunkSize;
                }
            } while (stillWorking);
            Console.WriteLine($"Total Size: 0x{totalSize:x016}");
        }
    }
}

If I uncomment the Console.WriteLine statement, the output starts out looking like this:

[Header] TimeStampFrequency: 00000000003408e2  TimeStamp: 000002a1a1bf04bb
00000001 a1bf04bb 00000020 000002ff 00000008 0007316be2c20350 0000 000002a1a1bf04bb
00000001 a1bf04be 00000020 00000400 00000008 020a011abf80138e 0000 000002a1a1bf04be
00000001 a1bf04c0 00000020 00000400 00000008 8000115f84f09f12 0000 000002a1a1bf04c0
00000001 a1bf04c2 00000020 00000401 00000008 0c1c1205690d81f8 0000 000002a1a1bf04c2
00000001 a1bf04c3 00000020 00000401 00000007 001fa2420000624d 0000 000002a1a1bf04c3
00000001 a1bf04c5 00000020 00000402 00000008 0c2a5a95b99d0286 0000 000002a1a1bf04c5
00000001 a1bf04c7 00000020 00000402 00000007 001faa6000003c49 0000 000002a1a1bf04c7
00000001 a1bf04c8 00000020 00000403 00000008 0c1c0c06840e02d2 0000 000002a1a1bf04c8
00000001 a1bf04ca 00000020 00000403 00000007 001fad4200006c5d 0000 000002a1a1bf04ca
00000001 a1bf04cc 00000020 00000404 00000008 0c1c0882800b82d8 0000 000002a1a1bf04cc
00000001 a1bf04cd 00000020 00000404 00000007 001fad8200009cd1 0000 000002a1a1bf04cd
00000001 a1bf04cf 00000020 00000405 00000008 0c1c0f04850cc2de 0000 000002a1a1bf04cf
00000001 a1bf04d0 00000020 00000405 00000007 001fada20000766f 0000 000002a1a1bf04d0
00000001 a1bf04d2 00000020 00000406 00000008 0c1bd80c4e13831a 0000 000002a1a1bf04d2
00000001 a1bf04d3 00000020 00000406 00000007 001faf800000505b 0000 000002a1a1bf04d3
00000001 a1bf04d5 00000020 00000407 00000008 0c23d51049974330 0000 000002a1a1bf04d5
00000001 a1bf04d6 00000020 00000407 00000007 001fb02000004873 0000 000002a1a1bf04d6
00000001 a1bf04d8 00000020 00000408 00000008 0c1c0a8490cc44ba 0000 000002a1a1bf04d8
00000001 a1bf04da 00000020 00000408 00000007 001fb762000088bf 0000 000002a1a1bf04da
00000001 a1bf04db 00000020 00000409 00000008 0c1c0603a0cbc4c0 0000 000002a1a1bf04db
00000001 a1bf04df 00000020 00000409 00000007 001fb76000008ee5 0000 000002a1a1bf04df
00000001 a1bf04e0 00000020 0000040a 00000008 0c23f70c5b9544cc 0000 000002a1a1bf04e0
00000001 a1bf04e2 00000020 0000040a 00000007 001fb7820000565f 0000 000002a1a1bf04e2
00000001 a1bf04e3 00000020 0000040b 00000008 0c1bf3049b4cc502 0000 000002a1a1bf04e3
00000001 a1bf04e5 00000020 0000040b 00000007 001fb82200007eab 0000 000002a1a1bf04e5

And finishes up with this:

Total Size: 0x00000000085ae0a8

Where that number in decimal is 140,173,480. That's about what I expected.

Update:

In order to get closer to what you asked, I took the code in the ReadStruct method and used it to create a corresponding WriteStruct method:

 private static void WriteStruct<T>(this BinaryWriter writer, T obj) where T : struct {
     var len = Marshal.SizeOf(typeof(T));
     var buffer = new byte[len];

     GCHandle handle = default(GCHandle);
     try {
         handle = GCHandle.Alloc(buffer, GCHandleType.Pinned);
         Marshal.StructureToPtr(obj, handle.AddrOfPinnedObject(), false);
     } finally {
         if (handle.IsAllocated)
             handle.Free();
     }
     writer.Write(buffer);
 }

With that, I can also modify my original code to read all the data, and write selective parts out to another file. In the code below, I read in the "Chunks" until the timestamp on the chunks is divisible by 10,000. Once that happens, I create a new CanHeader structure (I'm not exactly sure what should go there - but you should be). Then I create an output FileStream (i.e., a file to write to) and a BinaryWriter. I write the header to the FileSteam, and then I write the next 5000 chunks I read to that file. In your case, you can use the data in the chunk stream to decide what you want to do:

    using (var readStream = new FileStream("Klassifikation_only_Sensor1_01.dr2", FileMode.Open, FileAccess.Read)) {
        using (var reader = new BinaryReader(readStream)) {
            var headerTuple = reader.ReadStruct<CanHeader>();
            Console.WriteLine($"[Header] TimeStampFrequency: {headerTuple.Item1.TimeStampFrequency:x016}  TimeStamp: {headerTuple.Item1.TimeStamp:x016}"); ;
            bool stillWorking;
            UInt64 totalSize = 0L;
            UInt64 recordCount = 0L;
            var chunkSize = (UInt64)Marshal.SizeOf(typeof(CanChunk));
            var chunksWritten = 0;
            FileStream writeStream = null;
            BinaryWriter writer = null;
            var writingChucks = false;
            var allDone = false;
            try {
                do {
                    var chunkTuple = reader.ReadStruct<CanChunk>();
                    stillWorking = chunkTuple.Item2;
                    if (stillWorking) {
                        var chunk = chunkTuple.Item1;
                        if (!writingChucks && chunk.CanTime % 10_000 == 0) {
                            writingChucks = true;
                            var writeHeader = new CanHeader {
                                TimeStamp = chunk.TimeStamp,
                                TimeStampFrequency = headerTuple.Item1.TimeStampFrequency
                            };
                            writeStream = new FileStream("Output.dr2", FileMode.Create, FileAccess.Write);
                            writer = new BinaryWriter(writeStream);
                            writer.WriteStruct(writeHeader);
                        }
                        if (writingChucks && !allDone) {
                            writer.WriteStruct(chunk);
                            ++chunksWritten;
                            if (chunksWritten >= 5000) {
                                allDone = true;
                            }
                        }
                        totalSize += chunkSize;
                        ++recordCount;
                    }
                } while (stillWorking);
            } finally {
                writer?.Dispose();
                writeStream?.Dispose();
            }
            Console.WriteLine($"Total Size: 0x{totalSize:x016}  Record Count: {recordCount}  Records Written: {chunksWritten}");
        }
    }
}

When I'm finished, I can see that 5000 records are written to the file (it's 200,016 bytes long - 5000 40-byte records prefaced with a 16 byte header), and that the first record's CanTime is 0xa3a130d0 (or 2,745,250,000 - i.e., divisible by 10,000). Everything is I expect.

thank u to pointing me toward the direct way, but still it didn't resolve my needs. as I said I need to copy a certain chunk or more in a new file. I think it's a bad idea to copy every chunk or at least the TimeStamp into a list and then copy from my start TimeStamps to endTimestamps the data... — Nizar Belhiba, Dec 07 '18 at 09:23
The idea isn't _"copy every chunk or at least the TimeStamp into a list"_. I'm effectively reading a stream of chunks, one by one. You can write them out to separate files, one by one (to an output stream). Or, for example, you could compress the file by only writing out the Nth chunk to a new file. I whip up a WriteStruct method this weekend and show you what I'm thinking about. The thing is, if you read every chunk, you can do what you want with the chunks (one by one) as you are doing the reading. — Flydog57, Dec 07 '18 at 15:10
You'll have a much easier time working with the data as a memory mapped file. Then you could use the view accessors to marshal the data for you. — Jeff Mercado, Dec 07 '18 at 18:11
Also, you could use a sequential layout, you'll just have to set the pack size and charset. `[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Ansi, Pack = 4)]` to align to 4 bytes. — Jeff Mercado, Dec 07 '18 at 18:13
@NizarBelhiba: OK, done. I've updated things. I read chunks in and ignore them until I get a time stamp I want. Then I start writing them out to a new file until I hit a chunk limit. You should be able to adapt this to your needs. — Flydog57, Dec 09 '18 at 06:36
@NizarBelhiba: you should consider giving "FelixK" an upvote for his answer to this: https://stackoverflow.com/questions/8704161/c-sharp-array-within-a-struct. That's where I got the genesis of this answer — Flydog57, Dec 10 '18 at 15:23

How to cut data from a file?

1 Answers1