0

While looking at memory-mapped files in C#, there was some difficulty in identifying how to search a file quickly forward and in reverse. My goal is to rewrite the following function in the language, but nothing could be found like the find and rfind methods used below. Is there a way in C# to quickly search a memory-mapped file using a particular substring?

#! /usr/bin/env python3
import mmap
import pathlib


# noinspection PyUnboundLocalVariable
def drop_last_line(path):
    with path.open('r+b') as file:
        with mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as search:
            for next_line in b'\r\n', b'\r', b'\n':
                if search.find(next_line) >= 0:
                    break
            else:
                raise ValueError('cannot find any line delimiters')
            end_1st = search.rfind(next_line)
            end_2nd = search.rfind(next_line, 0, end_1st - 1)
        file.truncate(0 if end_2nd < 0 else end_2nd + len(next_line))
Noctis Skytower
  • 21,433
  • 16
  • 79
  • 117
  • For the Python stunted, can you describe what you're actually trying to achieve? Is it just finding if a given file contains a certain string? – Jamiec Jan 23 '18 at 16:10
  • @Jamiec No, it is trying to remove the last line of a CSV file where each line is terminated with a line delimiter. The following questions have unacceptable solutions because they process the entire file: (1) [How to delete last line in a text file?](https://stackoverflow.com/q/4264117/216356), (2) [How to Delete the last line of a file.txt](https://stackoverflow.com/q/14960392/216356), (3) [Efficient way to delete a line from a text file](https://stackoverflow.com/q/532217/216356) – Noctis Skytower Jan 23 '18 at 16:16
  • 1
    The point of memory-mapped files is that once they are mapped, you can forget that the source is a file and use _whatever_ memory search function or algorithm you wish. –  Jan 23 '18 at 16:18
  • Perhaps this is useful: https://learn.microsoft.com/en-us/dotnet/standard/io/memory-mapped-files – Jamiec Jan 23 '18 at 16:23
  • @MickyD Do you know of any way to memory-map an entire file in C# and then treat it as a byte array? After looking at the source code for the `find` and `rfind` methods, it does not appear that it would be too difficult to create a C# version of them if an array was available. Am I looking for the [UnmanagedMemoryAccessor.ReadByte Method (Int64)](https://msdn.microsoft.com/en-us/library/system.io.unmanagedmemoryaccessor.readbyte(v=vs.110).aspx)? – Noctis Skytower Jan 24 '18 at 16:46
  • Sorry for the delay, see answer below –  Jan 24 '18 at 17:28

1 Answers1

1

Is there a way in C# to quickly search a memory-mapped file using a particular substring?

Do you know of any way to memory-map an entire file in C# and then treat it as a byte array?

Yes, it's quite easy to map an entire file into a view then to read it into a single byte array as the following code shows:

static void Main(string[] args)
{
    var sourceFile=  new FileInfo(@"C:\Users\Micky\Downloads\20180112.zip");
    int length = (int) sourceFile.Length;  // length of target file

    // Create the memory-mapped file.
    using (var mmf = MemoryMappedFile.CreateFromFile(sourceFile.FullName,
                                                     FileMode.Open, 
                                                     "ImgA"))
    {
        var buffer = new byte[length]; // allocate a buffer with the same size as the file

        using (var accessor = mmf.CreateViewAccessor())
        {
            var read=accessor.ReadArray(0, buffer, 0, length); // read the whole thing
        }

        // let's try searching for a known byte sequence.  Change this to suit your file
        var target = new byte[] {71, 213, 62, 204,231};

        var foundAt = IndexOf(buffer, target);

    }
}

I couldn't seem to find any byte searching method in Marshal or Array but you can use this search algorithm courtesy of Social MSDN as a start:

private static int IndexOf2(byte[] input, byte[] pattern)
{
    byte firstByte = pattern[0];
    int  index     = -1;

    if ((index = Array.IndexOf(input, firstByte)) >= 0)
    {
        for (int i = 0; i < pattern.Length; i++)
        {
            if (index + i  >= input.Length ||
                pattern[i] != input[index + i]) return -1;
        }
    }

    return index;
}

...or even this more verbose example (also courtesy Social MSDN, same link)

public static int IndexOf(byte[] arrayToSearchThrough, byte[] patternToFind)
{
    if (patternToFind.Length > arrayToSearchThrough.Length)
        return -1;
    for (int i = 0; i < arrayToSearchThrough.Length - patternToFind.Length; i++)
    {
        bool found = true;
        for (int j = 0; j < patternToFind.Length; j++)
        {
            if (arrayToSearchThrough[i + j] != patternToFind[j])
            {
                found = false;
                break;
            }
        }
        if (found)
        {
            return i;
        }
    }
    return -1;
}
  • Your search algorithm is not one of the best. Using the [Boyer-Moore algorithm](https://stackoverflow.com/questions/9889427/search-longest-pattern-in-byte-array-in-c-sharp/9890164#9890164) you will likely see 5-8x speed ups. – Scott Chamberlain Jan 24 '18 at 17:37
  • Thanks for the example code! I will probably wait another day before accepting an answer to see if any other ideas are presented. – Noctis Skytower Jan 24 '18 at 17:49
  • @ScottChamberlain that's because it's not mine as indicated –  Jan 25 '18 at 00:33
  • @MickyD I know, what I am trying to say is finding a example of Boyer-Moore and showing that would be better. – Scott Chamberlain Jan 25 '18 at 15:34