1

first of all sorry for the bad English.

Well, I want to read the pieces hashes information from a torrent file. Currently, I'm using https://github.com/hyPiRion/java-bencode this bencode library to decode the information, but my problem is when I want to convert the string of pieces to a byte array. The torrent file is encoded in UTF-8. but If I do

 Byte[] bytepieces = piecestring.getBytes("UTF-8");

It gives well. anything really usable.

For other side, for comparing or try to get the string, instead of getting the bytes, I've read the first piece of my file, and calculate the sha1. After getting the 20 sized byte array of sha1 if I convert it to string, effectively, the string matches the first part of the big string of pieces... But well, If I try to return that generated string, to the 20 originally bytes that created it ... I can't... how to do this?

Little example:

FileInputStream fin = new FileInputStream("miFile");
byte[] array = new Byte[512*1024]; //a piece of 512 kb
fin.read(array,0,512*1024);
MessageDigest md = MessageDigest.getInstanse ("SHA);
Byte [ sha1byte = md.digest(array);
String s = new String(sha1byte,"UTF-8");

After doing this, sha1byte.length is 20, and is OK, the correct size of a sha1 hash. But if i do s.getBytes("UTF-8").length, in the case of my example i got... ¡33! ¡wuuut! I want to get again from the generated string my 20 arrays. How to can I get this?

Well thanks :P

Avinash
  • 2,093
  • 4
  • 28
  • 41
  • 1
    Why are you converting binary data to String in the first place? String is not a container for binary data. – user207421 Mar 27 '17 at 21:21
  • If you want to present/store binary data as text, use [Base-64](https://en.wikipedia.org/wiki/Base64) encoding, not a [UTF-8](https://en.wikipedia.org/wiki/UTF-8) *character set*. In Java 8, use [`java.util.Base64`](https://docs.oracle.com/javase/8/docs/api/java/util/Base64.html). – Andreas Mar 27 '17 at 21:48
  • Aside: Check out [the documentation for `InputStream.read`](https://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#read(byte[],%20int,%20int)). You'll notice it is not required to completely fill the given array, even if the end of stream is not yet reached. Here are [some alternatives that are guaranteed to read the entire stream](http://stackoverflow.com/questions/1264709/convert-inputstream-to-byte-array-in-java). – dnault Mar 27 '17 at 21:58
  • I'm storing binary data as strings, because the BEncode format in .torrent files, store that binary data as string. And i want to turn that strings to the hash byteArray. I know about base64, but the file is formated as UTF-8. The other option that i have, is re read all the .torrent file byte to byte, but for that I have to rewrite all the library. – sanslash332 Mar 28 '17 at 00:27

2 Answers2

0

I'm storing binary data as strings, because the BEncode format in .torrent files, store that binary data as string

Bencode "strings" are sequences of bytes, not sequences of unicode codepoints. Therefore a language's representation of bytes - byte[] or ByteBuffer in java - is appropriate and should only be interpreted as utf8 string in certain cases when they actually contain things that are supposed to be human-readable.

So you should use a bencoding library that supports extraction of the raw bytes.

the8472
  • 40,999
  • 5
  • 70
  • 122
  • yea, is the idea, but... ¿witch library can do this? I've tested with various libs, but the majority extracts the info as the type equivalent, like long, list, dict an strings. The lib that I linked, let you extract the element as object, but pass the object to a byteArray using a byteOutput and ObjectWriter, doesn't work as espected. u.u – sanslash332 Mar 28 '17 at 20:05
  • asking for library recommendations is offtopic on SO – the8472 Mar 28 '17 at 22:01
0

Thanks guys for your answer, but I can find the solution using this https://github.com/bedeho/bencodej

The lib loads the Bencode data alwais as bytearray with custom classes, and is able have a 1:1 with the bytestrings :p Thanks for all.