1

i'm having a list of 15000000 username on txt file & i wrote a method to create brain wallet out of it check if any address contain with a list of 600 address. It's pretty much like this

private static List<string> userList = new List<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoUser-workspace-db.txt"));
private static List<string> enterpriseUserList = new List<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoEnterpriseUser-local-db.txt"));
foreach (var i in userList)
{ 
    userid = ToAddress(i);
    if (enterpriseUserList.Contains(userid))
        Console.WriteLine(i,userid);        
    {
    private string ToAddress(string username)
    {
        string bitcoinAddress = BitcoinAddress.GetBitcoinAdressEncodedStringFromPublicKey(new PrivateKey(Globals.ProdDumpKeyVersion, new SHA256Managed().ComputeHash(UTF8Encoding.UTF8.GetBytes(username), 0, UTF8Encoding.UTF8.GetBytes(username).Length), false).PublicKey);     
    }

ToAddrsess method hash username into SHA256 string, get its public key & convert it into address like this:

15hDBtLpQfcbrrAFupWjgN5ieHeEBd8mbu

This code is ass, run really slow, handle about 200 line of data per second. So i try to improve it using multithreading

private static void CheckAddress(string username)
{                      
    var userid = ToAddress(username);
    if (enterpriseUserList.Contains(userid))
    {
        Console.WriteLine(i,userid);        
    }            
}
private static void Parallel() 
{
    List<string> items = new List<string>(File.ReadLines(@"C:\Users\Erik\Desktop\InfernoUser-workspace-db.txt"));
    ParallelOptions check = new ParallelOptions() { MaxDegreeOfParallelism = 100 };
    Parallel.ForEach<string>(items, check, line =>
    {
        CheckAddress(line);
    });
}

It didn't help much. Can anybody suggest how to improvise this? compare to vanitygen run on CPU which can handle 4-500k address per second. How can it make such a big difference?

Huang Lee
  • 57
  • 9
  • 1
    `Contains` does a linear search, your algorithm basically runs in O(N×M), it would be a lot faster if it could use an index of some sort. – Bart Friederichs Jul 02 '18 at 15:18
  • @Bart Friederichs can you be more specific? – Huang Lee Jul 02 '18 at 15:21
  • Instead of encoding the 15 million from user list can you decode the 600 from enterprise list and compare? – Jimmy Jul 02 '18 at 15:40
  • @Jimmy that's not how SHA256 work, you can't reserve it. I selected 600 random username from userList, convert it into address to make enterprise list – Huang Lee Jul 02 '18 at 15:43
  • ok, missed the sha256 part – Jimmy Jul 02 '18 at 15:56
  • Should I question why you are in possesion of `15000000` usernames? – Rand Random Jul 02 '18 at 16:12
  • @RandRandom i get it from my team project's database, it's just a pseudo – Huang Lee Jul 02 '18 at 16:16
  • Have you considered reading the file multithreaded? eg. https://stackoverflow.com/questions/17188357/read-large-txt-file-multithreaded – Rand Random Jul 02 '18 at 16:23
  • @RandRandom i think read the file was pretty ok, the hardest part was convert multiple seed into multiple address & compare it. The whole operation for one single address take me 200 milliseconds, like super slow. I try to recreate vanitygen with c#, it can generate & compare hundreds of thoundsans address per second which i intend to do – Huang Lee Jul 02 '18 at 16:33
  • @HuangLee you can perform list.contains using hashset. something like this `private static HashSet enterpriseUserList = new HashSet (File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoEnterpriseUser-local-db.txt"));` you can check performance comparison here. https://stackoverflow.com/questions/150750/hashset-vs-list-performance – gkardava Jul 02 '18 at 18:49
  • @gkardava what if i generate seed & use it instead of loading file? Would it make the program faster? – Huang Lee Jul 03 '18 at 02:47

3 Answers3

1

You can try to use Dictionary with key=userid, to prevent search by list each iteration

var dict = new ConcurrentDictionary<string, string>(100, userList.Count);

        userList.AsParallel().ForAll(item => 
        {
            dict.AddOrUpdate(ToAddress(item), item, (key,value)=>{return value;});
        });

        enterpriseUserList.AsParallel().ForAll(x =>
        {
            if (dict.ContainsKey(x))
            { Console.WriteLine(dict[x]); }
        });
MikkaRin
  • 3,026
  • 19
  • 34
  • Thanks, if you have any knowledge about crytocurrency, can you know any mehtod to create brain wallet faster. My code take about 200 milliseconds to create one – Huang Lee Jul 02 '18 at 15:30
  • there're error on AddOrUpdate, it said i can't take 2 arguments. How can i fix this? – Huang Lee Jul 02 '18 at 15:45
  • Second `.AsParallel()` is probably more wasteful than helpful. – Rand Random Jul 02 '18 at 16:07
  • @RandRandom can you recommend other solution? I just switch to MikkaRin way & it make my code run more about 260 line of username per second. I guess it's something – Huang Lee Jul 02 '18 at 16:12
  • 1
    @HuangLee - not arguing about the first `.AsParallel()` code block witch will give some benefits, but the second wont performe well - the `Console.WriteLine` part will synchronize and block the parallel execution - additionally nothing happens there that would need parallel execution. A normal `foreach` will perform better. - Depending on the number of calls to `Console.WriteLine` you should consider switching to a stringbuilder and call `Console.WriteLine(stringBuilder.ToString());` just once. – Rand Random Jul 02 '18 at 16:18
0

When looking for inefficiencies one of the major red flags is repeated function calls. You call GetBytes twice. Putting it into a separate variable and calling it once should help some what.

private string ToAddress(string username)
{
    var userNameAsBytes = UTF8Encoding.UTF8.GetBytes(username);
    string bitcoinAddress = BitcoinAddress.GetBitcoinAdressEncodedStringFromPublicKey(new PrivateKey(Globals.ProdDumpKeyVersion, new SHA256Managed().ComputeHash(userNameAsBytes, 0, userNameAsBytes.Length), false).PublicKey);     
}
tinstaafl
  • 6,908
  • 2
  • 15
  • 22
0

you can perform some operations here

  1. update List to HashSet . it will be dramatically perform Contains operations. I am sure it's slowest case at this code base. private static List<string> enterpriseUserList = new List<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoEnterpriseUser-local-db.txt")); change to private static HashSet<string> enterpriseUserList = new HashSet<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoEnterpriseUser-local-db.txt"));
  2. don't use ParallelOptions check = new ParallelOptions() { MaxDegreeOfParallelism = 100 }; this kind of optimizations would rise up your context switching and slow down performance.
  3. optimize Parallel.ForEach using Partitioner.Create

maybe that's all I can advise you.

    private static List<string> userList = new List<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoUser-workspace-db.txt"));
    private static HashSet<string> enterpriseUserList = new HashSet<string>(File.ReadAllLines(@"C:\Users\Erik\Desktop\InfernoEnterpriseUser-local-db.txt"));

 [MethodImpl(MethodImplOptions.AggressiveInlining)]
   private static void CheckAddress(int id,string username)
{                      
    var userid = ToAddress(username);
    if (enterpriseUserList.Contains(userid))
    {
       // todo
    }            
}


private static void Parallel() 
{
    var ranges = Partitioner.Create(0,userList.Count);
    Parallel.ForEach(ranges ,(range)=>{
     for(int i=range.Item1;i<range.Item2;i++){
              CheckAddress(i,userList[i])               
     }}

}
gkardava
  • 136
  • 4