0

So I set up an ssh honeypot on a ubuntu vm for fun, and wanted to sort the most frequently tried passwords and the most frequently tried user-pass combos.
I wrote something that while worked, took a fairly long time, even considering that the log file is 178.000+ lines.
So I decided to try multi-threading it. I tried using Parallel.ForEach.
This was messed up, didn't write the whole thing to the result files. I googled stuff and ended up finding something about concurrent stuff.
It kind of "works" now, but not really the way I want it to. It writes the data to the 2 files (most popular passwords.dat(mpp) and most popular combos.dat(mpc)), but they are neither in ascending nor descending order based on occurences in their lists.
(I know that the sorting works, because it was fine with just single-threaded foreach loops)
Here's the code I have so far (try not judge, I'm still in high school, I know it probably looks messy, and I will try to tidy it up a bit if I can get it to work)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Collections.Specialized;
using System.Threading;
using System.Collections.Concurrent;

namespace ConsoleApp1
{
    public class m
    {
        public static List<string> passwords = new List<string>();
        public static List<string> up = new List<string>();
        public static void mpcF()
        {

            var mpc = new ConcurrentBag<string>();

            var dictionary = up.GroupBy(str => str)
                                .ToDictionary(group => group.Key, group => group.Count());
            var items = from pair in dictionary
                        orderby pair.Value descending
                        select pair;
            Parallel.ForEach(items, item =>
            {
                mpc.Add((item.Key.PadRight(45) + " | " + item.Value/* + Environment.NewLine*/));
            });
            var result = string.Join(Environment.NewLine, mpc);
            File.WriteAllText("mpc.dat", result);
            Console.WriteLine("DUN-DUN-DUUNNN!!!!");
        }

        public static void mppF()
        {
            var mpp = new ConcurrentBag<string>();
            var dictionary = passwords.GroupBy(str => str)
                                .ToDictionary(group => group.Key, group => group.Count());
            var items = from pair in dictionary
                        orderby pair.Value descending
                        select pair;
            Parallel.ForEach(items, item =>
            {
                mpp.Add((item.Key.PadRight(45) + " | " + item.Value/* + Environment.NewLine*/));
            });
            var result = string.Join(Environment.NewLine, mpp);
            File.WriteAllText("mpp.dat", result);
            Console.WriteLine("DUN-DUN-DUUNNN!!!! (2)");
        }
    }

    class Program
    {

        static void read()
        {
            using (StreamReader sr = new StreamReader("ssh-honeypot.log"))
            {
                while (!sr.EndOfStream)
                {
                    string[] t = sr.ReadLine().Split(']')[1].Split(' ');
                    if (t[2] != "Error")
                    {
                        m.passwords.Add(t[3]);
                        m.up.Add(t[2] + " - " + t[3]);
                    }
                }

            }
        }

        static void print()
        {
            m.mpcF();
            m.mppF();
            /*Thread t1 = new Thread(new ThreadStart(m.mpcF));
            Thread t2 = new Thread(new ThreadStart(m.mppF));
            t1.Start();
            t2.Start();*/
        }

        static void Main(string[] args)
        {
            read();
            print();
            Console.ReadKey();
        }
    }
}
  • What are you trying to do in parallel? The parsing, sorting or the reconstruction bit? – Vivek Bernard Apr 19 '17 at 20:29
  • From someone who has been writing code for a long time, I know that the answer to most problems of this sort is almost never multi-threading. It's a powerful tool for the right kind of problem but you really need to know what you're doing. If you want to know WHY your program is slow then you can measure sections of code using the StopWatch class and see what's taking time: http://stackoverflow.com/questions/457605/how-to-measure-code-performance-in-net – Ray Fischer Apr 19 '17 at 22:27

1 Answers1

0

The reason they're not in sorted order is due to this code:

    public static void mppF()
        Parallel.ForEach(items, item =>
        {
            mpp.Add((item.Key.PadRight(45) + " | " + item.Value/* + Environment.NewLine*/));
        });

Parallel.ForEach is going to have multiple threads working on the list, each one adding things to the output collection (the ConcurrentBag). With multiple threads, there's no guarantee which thread is going to get which item, or in what order the threads will process those items. So what you've done is take a sorted collection (your items list) and scrambled it.

It's hard to imagine that only 178,000 items would take "a fairly long time." You should be able to do this in a matter of a few seconds (perhaps less) in a single thread. For example, you could do this:

public static void mppF()
{
    var orderedByCount = passwords
        .GroupBy(str => str)    // group by password
        .Select(g => new { Key = g.Key, Count = g.Count()) // select key and count
        .OrderBy(pair => pair.Count)  // sort
        .Select(pair => string.Format("{0:-45} | {1}", pair.Key, pair.Count); // construct output string

    File.WriteAllLines("mpp.dat", passwords);
    Console.WriteLine("DUN-DUN-DUUNNN!!!! (2)");
}

Your other function can be similarly optimized.

It's possible that you can parallelize that by adding an AsParallel call to the query, like this:

    var orderedByCount = passwords
        .AsParallel()
        .GroupBy(str => str)    // group by password
        .Select(g => new { Key = g.Key, Count = g.Count()) // select key and count
        .OrderBy(pair => pair.Count)  // sort
        .Select(pair => string.Format("{0:-45} | {1}", pair.Key, pair.Count); // construct output string

If I'm reading the Parallel LINQ documentation correctly, that should preserve order. Still, it's quite possible that with only 178,000 items, the sequential query will execute more quickly.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • Thank you! I'll definitely try it! I had a thought today, that I'll try to alter the script of the honeypot, so it doesn't write to the log file, but sends the data to a mysql db instead. I'll try both methods today! – Balázs Hasprai Apr 20 '17 at 13:16
  • Don't count on the database being faster than a sequential file. – Jim Mischel Apr 20 '17 at 14:26