0

I have an array of bytes that contains data. I would like to search for a specific string in the array of bytes. I dont know really how to do this in C#

byte[] ByteArray = File.ReadAllBytes(@"d:/MyDoc.docx");

String searchString = "Graphics";

How do I find the the word "Graphics" in the array dataArray? Thanks

Ehsan Sajjad
  • 61,834
  • 16
  • 105
  • 160
Dev OV
  • 19
  • 6
  • 1
    If you can have your searchString as byte array as well, this post should help you: https://stackoverflow.com/questions/283456/byte-array-pattern-search – Jarek Danielak May 01 '20 at 12:20
  • 4
    This won’t work. DOCX files are zipped XML files. So you first need to unzip it in order to get to the text data. – ckuri May 01 '20 at 12:21
  • 1
    [There is an SDK](https://learn.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part) for working with MS Office documents, including search. – Crowcoder May 01 '20 at 12:24

2 Answers2

1

You can convert your array of bytes into string like here

string converted = Encoding.UTF8.GetString(buffer, 0, buffer.Length);

and after that use the string.IndexOf() or string.Contains() method

Art_0f_War
  • 89
  • 2
  • 11
  • This has the obvious limitation that you are assuming that you know what sort of string encoding is used in the file. Other than that, this is a correct anwer to the OP’s question. However, see the comments on the question for reasons why this isn’t going to work at all in the case of DOCX. – Ammo Goettsch May 01 '20 at 14:22
1

Docx files are zipped. The only strings you'll find there are relative to zip headers. There won't be any text from your document.

You can see this in Windows by changing the docx file extension to zip and then double-clicking the file. You'll find an archive with some XML content, which can open with any Xml reader, or even notepad.

You could do the same thing manually in code (ie via System.IO.Compression types), but you don't have to. There are other libraries that have already done much of the hard work for you to extract the archive and already know what files and schema to look for. Some of them freely available on NuGet.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794