-2

My program reads from a device via a serial port and returns this string. 'IC'#$0088#$0080'Ô'#$0080#$0080 I need to get the 5 hex values and convert to binary. #$0088 = 10001000, #$0080 = 10000000, Ô = 11010100.

I can convert the 80 & 88, but am having difficulty extracting them from the whole string. The Ô(xD4) I can neither extract or convert. An extended character like the Ô could be at any or all locations.

The read methods in my serial component are:

function Read(var Buffer; Count: Integer): Integer;
function ReadStr(var Str: string; Count: Integer): Integer;
function ReadAsync(var Buffer; Count: Integer;   var AsyncPtr: PAsync): Integer;
function ReadStrAsync(var Str: Ansistring; Count: Integer;  var AsyncPtr: PAsync): Integer;

Can you give me an example of reading binary?

Mike
  • 301
  • 4
  • 16
  • 1
    You need to know the format. If you don't have a spec, then you can't parse this – David Heffernan Oct 25 '15 at 18:46
  • AFAICT, it's more of a display issue than format. If XE2 displayed Ô as #$00D4 it would be better. Any way to force that? – Mike Oct 25 '15 at 23:01
  • Mike, what is the communication protocol called and do you have a specification or description for it? Or, can you provide a link to such documentation? What is the communication component/library that you are using? I agree fully with @David that you should not deal with the messages as strings at all, really! I have both designed and implemented comm protocols for dedicated systems, and I can assure you that treating messages as arrays of bytes is 99% the right thing to do. Please provide docs or link to docs, and I w ill be glad to help you. – Tom Brunberg Oct 27 '15 at 00:03
  • Tom, the doc says "Response format. IC;abcde where abcde are 8-bit ASCII characters used as collections of flags.' IC is the command sent to the device to read, and is returned. The serial port library is by Dejan Crnila, V4.11 updated by Brian Gochnauer Oct 2010. There are about 150 reads to be made, and this is the only one that fails if I use ReadStr. – Mike Oct 27 '15 at 17:18
  • Mike, your original post did not indicate a semicolon (;) after 'IC'. Which one is correct? What is the number of characters received? 8?(or 7?) Final question, Which Read function are you using? the second? If yes, your problem stems from the fact that 8-bit ASCII is sent, but the read function is converting to WideChar. ASCII characters over $7F (actually Ansi, not ASCII) are subject to conversion depending on default code page in use. Therefore, the lower byte of the `Ô` character may or may not be correct. I assume all other messages are pure ASCII? Do you need bit values or bytes? – Tom Brunberg Oct 27 '15 at 18:13
  • I will post an answer after you have responded to above comment. Be precise. I also find it strange that you don't give any reference to the protocol. Is it proprietary, or why not? – Tom Brunberg Oct 27 '15 at 18:15
  • The OP (no ';') is correct, so the character count is 7. I've been using ReadStr. Yes, all other messages are ASCII. I need bit values. I don't know that there is a name for the protocol. It's for an amateur radio transceiver. You can read about it at http://www.elecraft.com/K2_Manual_Download_Page.htm#K3 Look for K3S/K3/KX3 Programmers Manual. – Mike Oct 27 '15 at 22:19
  • The IC command is on p.16. – Mike Oct 27 '15 at 22:25
  • Doesn't make sense to ask for bit or byte values. And 8 bit ASCII is wrong. ASCII is 7 bit. – David Heffernan Oct 27 '15 at 22:56
  • Googling 8 bit ASCII turns up a bunch of tables. In any case, the bytes returned can convert to 8 bits. Bit 7 isn't a flag, it's always 1. – Mike Oct 29 '15 at 10:12
  • OK, if you think you know better, and that ASCII is not a 7 bit encoding, then I cannot help you. If you don't want to learn, I'm not interested. – David Heffernan Oct 29 '15 at 21:00
  • No need for the 'tude dude. I appreciate your help. I never said I knew better. Here's an interesting discussion about it. http://stackoverflow.com/questions/14690159/is-ascii-code-7-bit-or-8-bit – Mike Oct 31 '15 at 00:17
  • I guess I lump ASCII and Extended ASCII (E.G. ISO-8859-1) in the same basket. – Mike Oct 31 '15 at 00:47

2 Answers2

2

It looks like the real problem is that you are treating binary data as though it were UTF-16 encoded text.

Whatever is feeding you this data, is not feeding you UTF-16 encoded text. What the device is really feeding you is a byte array. Treat it as such rather than as text. Then you can pick out the five values you are interested in by index.

So, declare an array of bytes:

var
  Data: TArray<Byte>; // dynamic array

or

var
  Data: TBytes; // shorthand for the same

or

var
  Data: array [0..N-1] of Byte; // fixed length array

And then read into those arrays. To pick out values, use Data[i].

Note that I am using a significant amount of guesswork here, based on the question and your comments. Don't take my word for it. My guessing could be wrong. Consult the specification of the communication protocol for the device. And learn carefully the difference between text and binary.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • I think you're on the right track. The return from the device is read by Comport into a string. The 'IC;' part is the command that is mirrored back, the rest seems to be binary, so I'm unsure how to treat it. – Mike Oct 26 '15 at 18:54
  • Don't put it into a string in the first place. – David Heffernan Oct 26 '15 at 18:55
  • I can extract each piece of interest using tmp := copy(inData,4,1); which gives me #$0088. tmp and inData are strings, but the value of tmp is not a string. length(tmp) returns zero, and pos doesn't work on it. x := ord('Ô'); gives me a value I can convert to D4. x:= ord(tmp); doesn't. – Mike Oct 26 '15 at 19:08
  • You don't have a clear picture of what you are doing here. You aren't at all clear on the difference between text and binary. Don't use text for binary data. – David Heffernan Oct 26 '15 at 19:38
  • You're right and I'm struggling to get educated. The spec on what I'm trying to read is "Response format. IC;abcde where abcde are 8-bit ASCII characters used as collections of flags.' I added to the OP, please take a look. – Mike Oct 27 '15 at 02:16
  • You need to call `Read` or `ReadAsync`. – David Heffernan Oct 27 '15 at 06:32
  • Read using what for buffer? I tried it using an array of ansichar and got 'I', 'C', 'ˆ', '€', 'Ô', '€', '€', #0 – Mike Oct 27 '15 at 22:47
  • It's binary. Read into a byte array. I already said all of this yesterday in my answer. – David Heffernan Oct 27 '15 at 22:55
  • Note that *8-bit ASCII characters* is meaningless. There's no such thing. ASCII is a 7 bit text encoding. – David Heffernan Oct 27 '15 at 23:04
1

As I wrote earlier in the comments, the problem with the message in your question is that it consists partly of non-ASCII characters. The ASCII range is from $00 to $7F and have the same characters as Unicode U+0000 to U+007F. Therefore no conversion (except for the leading 0). AnsiCharacters ($80 to $FF) on the other hand are subject to conversion according to the code page in use, in order to keep the same glyph for both. F.Ex. AnsiChar $80 (Euro sign in CP1252) is therefore converted to Unicode U+02C6. Bit patten for the lower byte doesn't match anymore.

Ref: https://msdn.microsoft.com/en-us/library/cc195054.aspx

Following code shows the result of two tests, Using Char vs. AnsiChar

procedure TMainForm.Button2Click(Sender: TObject);
const
  Buffer: array[0..7] of AnsiChar = ('I','C', #$88, #$80, #$D4, #$80, #$80, ';');
//  Buffer: array[0..7] of Char = ('I','C', #$88, #$80, #$D4, #$80, #$80, ';');
  BinChars: array[0..1] of Char = ('0','1');
var
  i, k: integer;
  c: AnsiChar;
//  c: Char;
  s: string;
begin
  for k := 2 to 6 do
  begin
    c := Buffer[k];
    SetLength(s, 8);
    for i := 0 to 7 do
      s[8-i] := BinChars[(ord(c) shr i) and 1];
    Memo1.Lines.Add(format('Character %d in binary format: %s',[k, s]));
  end;
end;

Using Char (UTF-16 WideChar)

AnsiChar #$88 is converted to U+02C6 
AnsiChar #$80 is converted to U+20AC 
AnsiChar #$D4 is converted to U+00D4 !

Lower byte gives

Character 2 in binary format: 11000110 
Character 3 in binary format: 10101100 
Character 4 in binary format: 11010100
Character 5 in binary format: 10101100 
Character 6 in binary format: 10101100

Using AnsiChar

Character 2 in binary format: 10001000
Character 3 in binary format: 10000000
Character 4 in binary format: 11010100
Character 5 in binary format: 10000000
Character 6 in binary format: 10000000

Unfortunately a conversion from Unicode to Ansi (even if originally converted from Ansi to Unicode) is lossy and will fail.

I really don't see any easy solution with the information available.

Tom Brunberg
  • 20,312
  • 8
  • 37
  • 54
  • The last conversion is correct. I tried a ComPort.Read(buffer, 7); where buffer is an array of ANSIChar. I got 'I', 'C', 'ˆ', '€', 'Ô', '€', '€', #0 I'll do my best to supply any info you need. – Mike Oct 27 '15 at 22:23
  • Oh honestly, why are you all so obsessed with text?! – David Heffernan Oct 27 '15 at 22:54
  • It's just so frustrating. It seems that the entire Delphi community, when they see an array of bytes, feel compelled to somehow force it into a string. – David Heffernan Oct 27 '15 at 23:01
  • 1. I think most people learn that first and have no need to go further. 2. If you print binary to a screen or a report, not a lot of people understand it [-o>. Anyhow, sorry for your frustration, but grateful for your patience and expertise. I learned something. Problem solved. – Mike Oct 28 '15 at 14:22