0

I want to save the results of the following code in a text file.

private static readonly string cd = Directory.GetCurrentDirectory();
private static readonly StringBuilder sb = new StringBuilder();
    
public static void Main()
{
string listPath = cd + "/list.txt";
string rat = File.ReadAllText(listPath);
List<string> list = rat.Split(new char[] {'\r','\n'},StringSplitOptions.RemoveEmptyEntries).ToList();

foreach (string fl in list)
{
    string rat2 = File.ReadAllText(fl);
    List<string> file = rat2.Split(new char[] {'\r','\n'},StringSplitOptions.RemoveEmptyEntries).ToList();

    foreach (string sn in file)
    {
        using (SHA256 mySHA256 = SHA256.Create())
        {
            byte[] hashValue = mySHA256.ComputeHash(Encoding.UTF8.GetBytes(sn));

            sb.Clear();

            foreach (Byte b in hashValue)
            {
            sb.Append(b.ToString("x2"));
            }

            string hashSHA256 = sb.ToString();
            string res = $"{hashSHA256},{sn}{Environment.NewLine}";
            string savePaths = $"{cd}/{hashSHA256.Substring(0,4)}.txt";

            File.AppendAllText(savePaths, res);
        }
    }
}
Console.WriteLine("___END___");
}

list.txt Contains the following content

C:/file1.txt
C:/file2.txt
C:/file3.txt

The result of the file will be something like below

  • 0000.txt
  • 0001.txt
  • 0002.txt
  • ...
  • fffd.txt
  • fffe.txt
  • ffff.txt

16^4 = 65,536 files

This increases the use of the hard drive and the execution speed is not optimal.

How can I increase the speed of execution and saving?

I know that saving in database might be a good option but we need to save in text file

ioxoi
  • 1
  • 2
  • Are you sure the performance problem is in saving the files, and not perhaps in doing the hashing, or converting the hashes to strings? – Ben Voigt Jun 14 '23 at 17:41
  • It's also odd to ask a question about `File.WriteAllLines()` when your code doesn't even use that function. – Ben Voigt Jun 14 '23 at 17:42
  • My code runs fine, but I'm looking for a way to speed it up – ioxoi Jun 14 '23 at 17:53
  • The easiest way to improve performance is to get a more powerful computer. Believe it or not, this is often much more cost-effective than paying an engineer to spend a lot of time tweaking performance. As for improving your code, that is a pretty big question-- maybe start here: [C# Threading - Reading and hashing multiple files concurrently, easiest method?](https://stackoverflow.com/questions/9895077/c-sharp-threading-reading-and-hashing-multiple-files-concurrently-easiest-met). You will want to use the producer/consumer pattern, parallel threads for hashing, and async I/O. – John Wu Jun 14 '23 at 17:53
  • 2
    Run a profiler and find out what line of code is slow. – Ben Voigt Jun 14 '23 at 17:55
  • I repeat take a profiler and have a look what is actually slow. I would suspect that having that many files in one folder is a problem by itself. Working with that folder is definetly a problem afterwards maybe already when editing something in that folder without explicitly enumerating the folder. – Ralf Jun 14 '23 at 18:06
  • File IO was greatly improved with Net 6. So if you are on an older framework ugrading might help a bit. – Ralf Jun 14 '23 at 18:08

2 Answers2

0

This should do a little better, for a few reasons:

  • Improves on RAM use by never loading the full contents of any file into RAM.
  • Improves performance converting the binary hash to hex string.
  • Only creates the SHA256 object once
// -------------------------------------------------------
// see: https://stackoverflow.com/a/624379/3043
// This uses the fastest result that doesn't need unsafe code
private static readonly uint[] _lookup32 = CreateLookup32();

private static uint[] CreateLookup32()
{
    var result = new uint[256];
    for (int i = 0; i < 256; i++)
    {
        string s=i.ToString("X2");
        result[i] = ((uint)s[0]) + ((uint)s[1] << 16);
    }
    return result;
}

private static string ByteArrayToHex(byte[] bytes)
{
    var lookup32 = _lookup32;
    var result = new char[bytes.Length * 2];
    for (int i = 0; i < bytes.Length; i++)
    {
        var val = lookup32[bytes[i]];
        result[2*i] = (char)val;
        result[2*i + 1] = (char) (val >> 16);
    }
    return new string(result);
}
// -------------------------------------------------------

private static readonly string cd = Directory.GetCurrentDirectory();

public static void Main()
{
    string listPath = Path.Combine(cd, "list.txt");
    using var mySHA256 = SHA256.Create();
 
    foreach (string fl in File.ReadLines(listPath))
    {
        foreach (string sn in File.ReadLines(fl))
        {                      
            byte[] hashValue = mySHA256.ComputeHash(Encoding.UTF8.GetBytes(sn));

            string hashSHA256 = ByteArrayToHex(hashValue); 
            string res = $"{hashSHA256},{sn}{Environment.NewLine}";
            string savePaths = $"{cd}/{hashSHA256.Substring(0,4)}.txt";

            File.AppendAllText(savePaths, res);
        }
    }
    Console.WriteLine("___END___");
}

It still has the weakness of reopening the output file after each line. You might do better managing a result dictionary. This would let you trade memory use for optimized disk I/O by only writing the result files once all the other processing is finished. In this way, there's only one write operation per output file. Whether this helps depends on how memory-constrained your system is and on how frequently you have hash collisions with the first four characters. If hash collisions are rare or memory use is the driving performance concern, add a dictionary buffer won't be worth it.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
-1

Try to use an execution pipeline and split the process using more threads. While you read the files, you also can encrypt the content and save in memory. When all files get encrypted, then you save it to a text file. Maybe the entire process runs more fast.

Something like this:

using System.Collections.Concurrent;
using System.Security.Cryptography;
using System.Text;

var cd = Directory.GetCurrentDirectory();
var sb = new StringBuilder();

ConcurrentDictionary<string, string> fileAndPath = new();
List<Task> tasks = new();

string listPath = cd + "/list.txt";
string rat = File.ReadAllText(listPath);
List<string> list = rat.Split(new char[] { '\r', '\n' }, 
StringSplitOptions.RemoveEmptyEntries).ToList();

foreach (string fl in list)
{
    tasks.Add(Task.Run(() =>
    {
        string rat2 = File.ReadAllText(fl);
        List<string> file = rat2.Split(new char[] { '\r', '\n' }, 
        StringSplitOptions.RemoveEmptyEntries).ToList();

        foreach (string sn in file)
        {
            tasks.Add(Task.Run(() =>
            {
                using (SHA256 mySHA256 = SHA256.Create())
                {
                    byte[] hashValue = 
                    mySHA256.ComputeHash(Encoding.UTF8.GetBytes(sn));

                    sb.Clear();

                    foreach (Byte b in hashValue)
                    {
                        sb.Append(b.ToString("x2"));
                    }

                    string hashSHA256 = sb.ToString();
                    string res = $"{hashSHA256},{sn}{Environment.NewLine}";
                    string savePaths = $"{cd}/{hashSHA256.Substring(0, 4)}.txt";

                    fileAndPath.TryAdd(savePaths, res);
                }
            }));
        }
    }));
}

Task.WhenAll(tasks).GetAwaiter().GetResult();

foreach (var entry in fileAndPath)
    File.AppendAllText(entry.Key, entry.Value);

Console.WriteLine("___END___");
bini
  • 47
  • 6