C# - Improve performace when searching

Question

i'm having a list of 15000000 username on txt file & i wrote a method to create brain wallet out of it check if any address contain with a list of 600 address. It's pretty much like this

private static List<string> userList = new List<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoUser-workspace-db.txt"));
private static List<string> enterpriseUserList = new List<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoEnterpriseUser-local-db.txt"));
foreach (var i in userList)
{ 
    userid = ToAddress(i);
    if (enterpriseUserList.Contains(userid))
        Console.WriteLine(i,userid);        
    {
    private string ToAddress(string username)
    {
        string bitcoinAddress = BitcoinAddress.GetBitcoinAdressEncodedStringFromPublicKey(new PrivateKey(Globals.ProdDumpKeyVersion, new SHA256Managed().ComputeHash(UTF8Encoding.UTF8.GetBytes(username), 0, UTF8Encoding.UTF8.GetBytes(username).Length), false).PublicKey);     
    }

ToAddrsess method hash username into SHA256 string, get its public key & convert it into address like this:

15hDBtLpQfcbrrAFupWjgN5ieHeEBd8mbu

This code is ass, run really slow, handle about 200 line of data per second. So i try to improve it using multithreading

private static void CheckAddress(string username)
{                      
    var userid = ToAddress(username);
    if (enterpriseUserList.Contains(userid))
    {
        Console.WriteLine(i,userid);        
    }            
}
private static void Parallel() 
{
    List<string> items = new List<string>(File.ReadLines(@"C:\Users\Erik\Desktop\InfernoUser-workspace-db.txt"));
    ParallelOptions check = new ParallelOptions() { MaxDegreeOfParallelism = 100 };
    Parallel.ForEach<string>(items, check, line =>
    {
        CheckAddress(line);
    });
}

It didn't help much. Can anybody suggest how to improvise this? compare to vanitygen run on CPU which can handle 4-500k address per second. How can it make such a big difference?

`Contains` does a linear search, your algorithm basically runs in O(N×M), it would be a lot faster if it could use an index of some sort. — Bart Friederichs, Jul 02 '18 at 15:18
Instead of encoding the 15 million from user list can you decode the 600 from enterprise list and compare? — Jimmy, Jul 02 '18 at 15:40
@Jimmy that's not how SHA256 work, you can't reserve it. I selected 600 random username from userList, convert it into address to make enterprise list — Huang Lee, Jul 02 '18 at 15:43
Should I question why you are in possesion of `15000000` usernames? — Rand Random, Jul 02 '18 at 16:12
@RandRandom i get it from my team project's database, it's just a pseudo — Huang Lee, Jul 02 '18 at 16:16
Have you considered reading the file multithreaded? eg. https://stackoverflow.com/questions/17188357/read-large-txt-file-multithreaded — Rand Random, Jul 02 '18 at 16:23
@RandRandom i think read the file was pretty ok, the hardest part was convert multiple seed into multiple address & compare it. The whole operation for one single address take me 200 milliseconds, like super slow. I try to recreate vanitygen with c#, it can generate & compare hundreds of thoundsans address per second which i intend to do — Huang Lee, Jul 02 '18 at 16:33
@HuangLee you can perform list.contains using hashset. something like this `private static HashSet enterpriseUserList = new HashSet (File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoEnterpriseUser-local-db.txt"));` you can check performance comparison here. https://stackoverflow.com/questions/150750/hashset-vs-list-performance — gkardava, Jul 02 '18 at 18:49
@gkardava what if i generate seed & use it instead of loading file? Would it make the program faster? — Huang Lee, Jul 03 '18 at 02:47

MikkaRin · Answer 1 · 2018-07-02T15:50:42.760

1

You can try to use Dictionary with key=userid, to prevent search by list each iteration

var dict = new ConcurrentDictionary<string, string>(100, userList.Count);

        userList.AsParallel().ForAll(item => 
        {
            dict.AddOrUpdate(ToAddress(item), item, (key,value)=>{return value;});
        });

        enterpriseUserList.AsParallel().ForAll(x =>
        {
            if (dict.ContainsKey(x))
            { Console.WriteLine(dict[x]); }
        });

edited Jul 02 '18 at 15:50

answered Jul 02 '18 at 15:25

MikkaRin

3,026
19
34

Thanks, if you have any knowledge about crytocurrency, can you know any mehtod to create brain wallet faster. My code take about 200 milliseconds to create one – Huang Lee Jul 02 '18 at 15:30
there're error on AddOrUpdate, it said i can't take 2 arguments. How can i fix this? – Huang Lee Jul 02 '18 at 15:45
Second `.AsParallel()` is probably more wasteful than helpful. – Rand Random Jul 02 '18 at 16:07
@RandRandom can you recommend other solution? I just switch to MikkaRin way & it make my code run more about 260 line of username per second. I guess it's something – Huang Lee Jul 02 '18 at 16:12
1

@HuangLee - not arguing about the first `.AsParallel()` code block witch will give some benefits, but the second wont performe well - the `Console.WriteLine` part will synchronize and block the parallel execution - additionally nothing happens there that would need parallel execution. A normal `foreach` will perform better. - Depending on the number of calls to `Console.WriteLine` you should consider switching to a stringbuilder and call `Console.WriteLine(stringBuilder.ToString());` just once. – Rand Random Jul 02 '18 at 16:18

score 0 · Answer 2 · answered Jul 02 '18 at 15:40

When looking for inefficiencies one of the major red flags is repeated function calls. You call GetBytes twice. Putting it into a separate variable and calling it once should help some what.

private string ToAddress(string username)
{
    var userNameAsBytes = UTF8Encoding.UTF8.GetBytes(username);
    string bitcoinAddress = BitcoinAddress.GetBitcoinAdressEncodedStringFromPublicKey(new PrivateKey(Globals.ProdDumpKeyVersion, new SHA256Managed().ComputeHash(userNameAsBytes, 0, userNameAsBytes.Length), false).PublicKey);     
}

i just test it with Stopwatch, it's pretty much the same :| – Huang Lee Jul 02 '18 at 15:56 — Huang Lee, Jul 02 '18 at 15:56

score 0 · Answer 3 · answered Jul 03 '18 at 07:35

you can perform some operations here

update List to HashSet . it will be dramatically perform Contains operations. I am sure it's slowest case at this code base. private static List<string> enterpriseUserList = new List<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoEnterpriseUser-local-db.txt")); change to private static HashSet<string> enterpriseUserList = new HashSet<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoEnterpriseUser-local-db.txt"));
don't use ParallelOptions check = new ParallelOptions() { MaxDegreeOfParallelism = 100 }; this kind of optimizations would rise up your context switching and slow down performance.
optimize Parallel.ForEach using Partitioner.Create

maybe that's all I can advise you.

    private static List<string> userList = new List<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoUser-workspace-db.txt"));
    private static HashSet<string> enterpriseUserList = new HashSet<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoEnterpriseUser-local-db.txt"));

 [MethodImpl(MethodImplOptions.AggressiveInlining)]
   private static void CheckAddress(int id,string username)
{                      
    var userid = ToAddress(username);
    if (enterpriseUserList.Contains(userid))
    {
       // todo
    }            
}


private static void Parallel() 
{
    var ranges = Partitioner.Create(0,userList.Count);
    Parallel.ForEach(ranges ,(range)=>{
     for(int i=range.Item1;i<range.Item2;i++){
              CheckAddress(i,userList[i])               
     }}

}

C# - Improve performace when searching

3 Answers3