-1

I have a string that i must parse, setting a classes vars correctly from it. The string is badly structured but i can not change it. I have tried parsing it but dont have a good way to do it without issue. The string itself is a set of attributes and params that are used to trigger a exe from cmd line.

I have laid it out in a way to make easier to read but know that is it 1 continuous string.

Here are the rules as to how to read this line. For every 'non -dll' command you can expect a single key and value pair. for the dlls lines you can have a single or multiple key-value pairs after the initial dll=,' ie the 'dll' element contains 0 or more keyValue or stand alone value split by spaces. eg dll=oneMoreDll, andItsParam=value anotherParam=value lastParam=value value

Input string

time=value1 size=value2 dll=aDllName dll=anotherDllName, someParam=ParamValue dll=yetAnotherDll, someOhterParam=anotherParamValue aStandAloneValue dll=oneMoreDll, andItsParam=value anotherParam=value lastParam=value

I want to be able to parse this string into the following format, i was thinking each line in a string array.

I have tried splitting by spaces and then 'dll' but my regex aint up to scratch or its impossible (im sure its not). Help!

Desired output elements, to be stored in a String array

time=value1 
size=value2 
dll=aDllName 
dll=anotherDllName, someParam=ParamValue  
dll=yetAnotherDll, someOhterParam=anotherParamValue aStandAloneValue
dll=oneMoreDll, andItsParam=value anotherParam=value lastParam=value
Fearghal
  • 10,569
  • 17
  • 55
  • 97
  • 1
    Why not split by spaces, and then further split the result by `=` ? – Rob Jan 27 '16 at 11:49
  • splitting by spaces leads to fragmenting of the dll strings with spaces in them, i want to keep each dlls detail together - some have params some dont – Fearghal Jan 27 '16 at 11:50
  • 1
    What is your question? Show expected input and output and **explain it**. **Why** do `dll=anotherDllName, someParam=ParamValue` belong together, `andItsParam=value anotherParam=value lastParam=value` also, but `attribute1=value1` and `attribute2=value2` not? – CodeCaster Jan 27 '16 at 11:52
  • Split with [`Regex.Split(input, @"(?=\b(?:attribute\d+|dll)=)");`](http://regexstorm.net/tester?p=(%3f%3d%5cb(%3f%3aattribute%5cd%2b%7cdll)%3d)&i=attribute1%3dvalue1+attribute2%3dvalue2+dll%3daDllName+dll%3danotherDllName%2c+someParam%3dParamValue+dll%3dyetAnotherDll%2c+someOhterParam%3danotherParamValue+aStandAloneValue+dll%3doneMoreDll%2c+andItsParam%3dvalue+anotherParam%3dvalue+lastParam%3dvalue) - is that what you need? See the *Split* tab at the bottom. – Wiktor Stribiżew Jan 27 '16 at 11:54
  • I have stated the input and output. The attribute 1 and 2 dont belong together for reasons in the logic of the code, not relevant as i can not change this string and how it is used. i can give you a example string but its values will not add to the problem statement, the input and output as stated is sufficient i feel. Hold just a sec and il get you example – Fearghal Jan 27 '16 at 11:56
  • Except we would have seen dll names could have spaces in them – PaulF Jan 27 '16 at 11:56
  • Hi Wikto Stribizew, im not sure but that sure looks a more complete solution than i was attemting in regex :) – Fearghal Jan 27 '16 at 11:57
  • I already had input and output in Q but i have now marked them clearly – Fearghal Jan 27 '16 at 11:57
  • PaulF - i mean dll elements have spaces in them, not the names themselves. Look at the desired output - can you achieve this? – Fearghal Jan 27 '16 at 12:01
  • _"The attribute 1 and 2 dont belong together for reasons in the logic of the code, not relevant"_ - it's very relevant, as this is required for a proper solution. Look, it can be as easy as writing a couple of sentences, like _"The string contains zero or more `attribute=value` pairs, followed by zero or more `dll=some_dll_name[, with some optional=attributevalues]`"_. – CodeCaster Jan 27 '16 at 12:03
  • Im not sure what i can say, the desired output is stated, I have changed the 'attribute1' and 'attribute2' to random words 'time' and 'size' but again they are distinct values used by the cmd for purposes i cant influence. they have no bearing on each other hence must be separated as shown – Fearghal Jan 27 '16 at 12:05
  • My question is whether you can write down, in your question, in human language, what determines an attribute and a "dll line". For example after the first `dll=`, is it valid to have more random `attribute=value` "lines"? Is it valid for a "dll parameter" to have the format `dll=`? – CodeCaster Jan 27 '16 at 12:06
  • Ok i will do that, apologies – Fearghal Jan 27 '16 at 12:07
  • Ok done. For every 'non -dll' command you can expect a single key and value pair. for the dlls lines you can have a single or multiple key-value pairs after the initial dll=,' ie the 'dll' element contains 0 or more keyValue or stand alone value split by spaces. eg dll=oneMoreDll, andItsParam=value anotherParam=value lastParam=value value – Fearghal Jan 27 '16 at 12:13

3 Answers3

2

The following should work, at least for the sample case.

  1. Split the string by ' '
  2. Split each sub-string by '='. If there's no '=', we simply take the left side.

We're now left with a structure that looks something like this:

{ left = attribute1, right = value1 }, { left = attribute2, right = value2 }, { left = aStandAloneValue }, etc.

Now, we need to group each item by the previous 'dll'. I'm using an extension method taken from this answer to help with that.

Essentially, it will group until the condition is not met. In our case, we want to stop grouping when we hit a 'dll' entry. Or, if we haven't yet hit a 'dll' entry, then we always create a new group.

The rest is simply formatting the output (which may not be needed in your case).

var inStr = "time=value1 size=value2 dll=aDllName dll=anotherDllName, someParam=ParamValue dll=yetAnotherDll, someOhterParam=anotherParamValue aStandAloneValue dll=oneMoreDll, andItsParam=value anotherParam=value lastParam=value";

bool isBeforeAnyDll = true;

var result = inStr.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
    .Select(r => {
        var split = r.Split('=');
        if (split.Length == 1)
            return new { left = split[0], right = (string)null };

        var left = split[0];
        var right = split[1];
        return new { left, right };
    })
    .GroupAdjacentBy((l, r) =>  {
        return r.left == "dll" 
             ? isBeforeAnyDll = false
             : !isBeforeAnyDll;
    })
    .Select(g => string.Join(" ", 
        g.Select(gg => { 
            if (gg.right == null)
                return gg.left;
            return string.Format("{0}={1}", gg.left, gg.right);
        })));


//https://stackoverflow.com/a/4682163/563532
public static class LinqExtensions
{
    public static IEnumerable<IEnumerable<T>> GroupAdjacentBy<T>(
        this IEnumerable<T> source, Func<T, T, bool> predicate)
    {
        using (var e = source.GetEnumerator())
        {
            if (e.MoveNext())
            {
                var list = new List<T> { e.Current };
                var pred = e.Current;
                while (e.MoveNext())
                {
                    if (predicate(pred, e.Current))
                    {
                        list.Add(e.Current);
                    }
                    else
                    {
                        yield return list;
                        list = new List<T> { e.Current };
                    }
                    pred = e.Current;
                }
                yield return list;
            }
        }
    }
}

Output:

time=value1 
size=value2 
dll=aDllName 
dll=anotherDllName, someParam=ParamValue 
dll=yetAnotherDll, someOhterParam=anotherParamValue aStandAloneValue 
dll=oneMoreDll, andItsParam=value anotherParam=value lastParam=value 

The data is all properly grouped together after the .GroupAdjacentBy(), the following code is simply formatting the output.

Community
  • 1
  • 1
Rob
  • 26,989
  • 16
  • 82
  • 98
  • 1
    Can you explain the _logic_ this code implements, so as to help OP verify whether it does what they want and so later visitors who stumble upon this can understand it? – CodeCaster Jan 27 '16 at 12:04
  • 1
    Wow, thats brill....nearly there, just tested and my code spits out the 'dll' stuff on their correct lines aut doesnt separate the time and size, they appear in a continuous line. I can take it from here if you prefer, just thought i'd let you know – Fearghal Jan 27 '16 at 12:24
  • @Fearghal I've updated it since you've tested, it now will put time & size on separate lines :) – Rob Jan 27 '16 at 12:25
  • One more thing, how can we output a String[] instead of a IEEnum var? – Fearghal Jan 27 '16 at 12:27
  • @Fearghal Simply tack on a `.ToArray()` at the very end – Rob Jan 27 '16 at 12:28
  • mixed emotions. this is a great answer and does the job i stated however i have put my entire real life string in and it ends up not splitting correctly. my bad i guies, for some reason theres a diff even tho i cant see it – Fearghal Jan 27 '16 at 12:35
0

Why not split by Enviroment.NewLine then by , then split by the first = sign, take the left part as the variable name, then the right part as the variable value?

dburner
  • 1,007
  • 7
  • 22
0

You can use the following regex approach using Regex.Matches:

using System;
using System.Linq;
using System.Text.RegularExpressions;
public class Test
{
    public static void Main()
    {
        var log = "time=value1 size=value2 dll=aDllName dll=anotherDllName, someParam=ParamValue dll=yetAnotherDll, someOhterParam=anotherParamValue aStandAloneValue dll=oneMoreDll, andItsParam=value anotherParam=value lastParam=value";
        var res = Regex.Matches(log, @"\bdll=(?:(?!\bdll=).)*|\w+=\w+")
                 .Cast<Match>()
                 .Select(p => p.Value)
                 .ToList();
        Console.WriteLine(string.Join("\n",res));
    }
}

See IDEONE demo and a regex demo

The regex matches 2 alternatives:

  • \bdll= - a whole word dll= followed with...
  • (?:(?!\bdll=).)* - zero or more characters that are not dll
  • | - or....
  • \w+=\w+ - one or more word characters followed with = followed with one or more word characters.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Sorry Wiktor, in my effort to simplify i have given you a lever which really isn't avail, I updated the Q with a more realistic value - instead of attribute and 2 i mean i have a random string eg 'time' or 'egg' or 'salad' – Fearghal Jan 27 '16 at 12:03