4

I have a stream of partially binary data, and I want to match when a certain bit is set in a byte in a certain position in the string.

This is an existing system in .NET using System.Text.RegularExpressions which is configured with a number of patterns - when certain patterns are matched, the match triggers an action.

I'm interfacing to a device where one of the indicators is only available within a bitfield.

The only alternative I can see is to match a whole equivalence class of all the bytes which have that bit set.

This is a Mettler-Toledo scale interface.

The stream looks like this:

STX
SWA
SWB
SWC
WEIGHT (6 bytes ASCII)
TARE (6 bytes ASCII)
0x0D (CR)
(Optional checksum)

Where SWA, SWB, SWC are status word bytes and I'm interested in bit 3 of SWB.

They always set bit 5 to 1 in all these status words so that it is a space (0x20) when no bits are set. So in practice with no other status bits coming through, SWB alternates between ( (0x50 - 01010000) and SPACE (0x20 - 00100000) In actuality, the scale is also likely to send bits 0 and 4 in other states which I don't care about.

So I could match ..[\(all other equivalent characters]..{6}.{6}\r\0

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
Cade Roux
  • 88,164
  • 40
  • 182
  • 265

4 Answers4

2

When it comes to regular expressions, a character is an indivisible atomic unit, so you need to create a character class in order to match bits within a character.

There are two ways to include or exclude a group of characters in a character class - by listing them individually, as in [asdfg], or by specifying a range, as in [a-z].

In the worst case, your group would contain 128 elements covering a single bit. However, if you are matching higher-order bits, you can use ranges to group consecutive characters together.

For example, matching bit 8 is

[\u0080-\u00FF]

matching bit 7 is

[\u0040-\u007F\u00C0-\u00FF]`

matching bit 6 is

[\u0020-\u003F\u0060-\u007F\u0060-\u007F\u00E0-\u00FF]

and so on.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
1

If I understand correctly, the only possible values for SWB are (in binary) 001xx00x, and you need just to use a regex to distinguish 001x000x (bit 3 = 0) from 001x100x (bit 3 = 1). Is that correct? If so, then you can use this character class to detect when bit 3 = 0:

[\u0020\u0021\u0030\u0031]

and this one to detect when bit 3 = 1:

[\u0028\u0029\u0038\u0039]

If there were more different possible values for SWB, then it might be worth doing something more clever, but as it is, I don't think there's a need.

ruakh
  • 175,680
  • 26
  • 273
  • 307
  • That's correct. And I would have to add other equivalent bytes if other bits were likely to be set. – Cade Roux Aug 07 '12 at 17:36
  • In actuality, I'm going to have to recompile this thing with some alternate logic/configuration for this scale which is not dependent upon simple regex and try to figure out how to refactor this to a consistent configuration later. – Cade Roux Aug 07 '12 at 17:38
  • @CadeRoux: Re: creating an alternate non-regex configuration: Yeah, I think that's the right choice. :-) – ruakh Aug 08 '12 at 12:25
0

Unless I understand you wrong - you're looking to apply a regular expression on things other than strings (in your example above, a bitfield).

Look at this thread that links to a method to apply regular expression matching on a stream. You can then supply your data to the matcher correctly i.e.,

Community
  • 1
  • 1
Ani
  • 10,826
  • 3
  • 27
  • 46
0

You've got a stream of short, fixed-length records coming from a slow-speed input device. Using regular expressions to read/parse this seems like using a hammer to drive screws.

Why not just read the data with a BinaryReader into a custom class and process it as objects? Easier to understand, easier to maintain.

Something like this:

    static void Main( string[] args )
    {
        using ( Stream       s      = OpenStream() )
        using ( BinaryReader reader = new BinaryReader( s , Encoding.ASCII ) )
        {
            foreach ( ScaleReading reading in ScaleReading.ReadInstances(reader) )
            {
                if ( !reading.IsValid ) continue ; // let's just skip invalid data, shall we?
                bool isInteresting = (reading.StatusB & 0x08) == 0x08 ;
                if ( isInteresting )
                {
                    ProcessInterestingReading(reading) ;
                }
            }
        }

        return;
    }

where ScaleReading looks something like this:

class ScaleReading
{

    private ScaleReading( byte[] data , int checkSum )
    {
        this.Data             = data                            ;
        this.CheckSum         = checkSum                        ;
        this.ComputedCheckSum = ComputeCheckSumFromData( data ) ;

        this.STX     = data[0] ;
        this.StatusA = data[1] ;
        this.StatusB = data[2] ;
        this.StatusC = data[3] ;
        this.Weight  = ToInteger( data, 4, 6 ) ;
        this.Tare    = ToInteger( data, 10,6 ) ;
        this.CR      = data[16] ;

    }

    private int ToInteger( byte[] data , int offset , int length )
    {
        char[] chs   = Encoding.ASCII.GetChars( data , offset , length ) ;
        string s     = new String( chs ) ;
        int    value = int.Parse( s ) ;

        return value ;
    }

    private int ComputeCheckSumFromData( byte[] data )
    {
        //TODO: compute checksum from data octets
        throw new NotImplementedException();
    }

    public bool IsValid
    {
        get
        {
            bool isValid = ComputedCheckSum == CheckSum
                        && STX              == '\x0002'  // expected STX char is actually STX
                        && CR               == '\r'      // expected CR  char is actually CR
                        ;
            return isValid ;
        }
    }

    public byte[] Data             { get ; private set ; }
    public int    ComputedCheckSum { get ; private set ; }
    public int    CheckSum         { get ; private set ; }

    public byte STX     { get ; private set ; } // ?
    public byte StatusA { get ; private set ; } // might want to make each of status word an enum
    public byte StatusB { get ; private set ; } // might want to make each of status word an enum
    public byte StatusC { get ; private set ; } // might want to make each of status word an enum
    public int  Weight  { get ; private set ; }
    public int  Tare    { get ; private set ; }
    public byte CR      { get ; private set ; }

    public static ScaleReading ReadInstance( BinaryReader reader )
    {
        ScaleReading instance = null;
        byte[]       data     = reader.ReadBytes( 17 );

        if ( data.Length > 0 )
        {
            if ( data.Length != 17 ) throw new InvalidDataException() ;

            int checkSum = reader.ReadInt32() ;
            instance     = new ScaleReading( data , checkSum );

        }

        return instance;

    }

    public static IEnumerable<ScaleReading> ReadInstances( BinaryReader reader )
    {
        for ( ScaleReading instance = ReadInstance(reader) ; instance != null ; instance = ReadInstance(reader) )
        {
            yield return instance ;
        }
    }

}
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
  • Because I didn't want to have to change yet another quirkily written legacy program when I've got other work to do. So yes, now it has a special mode to do basically what you've written, and hopefully this source code is otherwise the correct version and this build will test fine. – Cade Roux Aug 07 '12 at 19:37