1

I strings in the format of name:key:dataLength:data and these strings can often be chained together. for example "aNum:n:4:9879aBool:b:1:taString:s:2:Hi" this would map to an object something like:

{
  aNum: 9879,
  aBool: true,
  aString: "Hi"
}

I have a method for parsing a string in this format but I'm not sure whether it's use of substring is the most efficient way of pprocessing the string, is there a more efficient way of processing strings in this fashion (repeatedly chopping off the front section):

Map<string, dynamic> fromString(String s){
  Map<String, dynamic> _internal = new Map();
  int start = 0;
  while(start < s.length){
    int end;
    List<String> parts = new List<String>(); //0 is name, 1 is key, 2 is data length, 3 is data
    for(var i = 0; i < 4; i++){
      end = i < 3 ? s.indexOf(':') : num.parse(parts[2]);
      parts[i] = s.substring(start, end);
      start = i < 3 ? end + 1 : end;
    }
    var tranType = _tranTypesByKey[parts[1]]; //this is just a map to an object which has a function that can convert the data section of the string into an object
    _internal[parts[0]] = tranType._fromStr(parts[3]);
  }
  return _internal;
}
Daniel Robinson
  • 13,806
  • 18
  • 64
  • 112
  • Can you please provide information about the expected size of such strings or how many such strings you want to process at once. It depends a lot of such information what strategy makes sense or if it makes sense at all to invest time for optimizations. – Günter Zöchbauer Feb 27 '14 at 12:29
  • the data part of the string could be absolutely anything in terms of size and content, and the chaining of these objects could also be enormous, I wouldnt expect them to be my personal reason for using them will only use smal simple objects, but the classes should support objects with any number of properties. – Daniel Robinson Feb 27 '14 at 12:32
  • Then you also might consider to do async processing using streams and do the parsing with a [Finite State Machine](http://en.wikipedia.org/wiki/Finite-state_machine), so you don't need to have more than one copy in memory at the same time. You could also already start processing the data while the data is still beeing recieved. Probably slower for small chunks but might be faster for huge data. – Günter Zöchbauer Feb 27 '14 at 12:38
  • see also http://stackoverflow.com/questions/13221168/ – Günter Zöchbauer Feb 27 '14 at 12:44
  • Why don't you just use json or ProtocolBuffers and existing parsers? (of course only if you can decide which format to use for encoded data) – Günter Zöchbauer Feb 27 '14 at 12:52
  • do those things allow you to register your own types so that they are deserialized on the client side as their native type? json as far as I know only supports `strings` `bools` `ints` `lists` and `maps` as far as I know. never heard of protocol buffers I'll have to read up on them. I just wanted a simple way of sending type info over in string so it can be transparently converted back into the native object on the client side regardless of what type the user sends, even their own custom data types. – Daniel Robinson Feb 27 '14 at 13:10
  • JSON can be easily en/decoded between String (JSON) and Map representation. The values can than easily be assigned to properties of any object. There is also support to directly map between JSON/custom object. Your example doesn't seem to be something else than numbers, Strings and maps/lists of them anyway. Protocol buffer provides efficient encoding for data with fixed structure. You describe the data Structure in text files and then you can generate code for classes and serialization/deserialization from this description. – Günter Zöchbauer Feb 27 '14 at 13:17
  • what if I have `var toSend = {aDateTime: new DateTime.now()}; JSON.encode(toSend);` this throws an error because `DateTime` does not implement `toJson` if I have a custom class `Foo`and I implement `toJson` then great I can turn it to json but on the other end how does the deserializer work out that that particular arbitrary string is a `Foo` and how does it turn that string back into a `Foo`? – Daniel Robinson Feb 27 '14 at 13:23
  • [How to convert an object containing DateTime fields to JSON in Dart?](http://stackoverflow.com/questions/16537114/). Custom object de/serialization [Add JSON serializer to every model class?](http://stackoverflow.com/questions/20024298) If you go for speed you should avoid deserialization that is base on reflection. But it's easy to add a toJson/fromJson implementation to you class. – Günter Zöchbauer Feb 27 '14 at 13:34
  • Im familiar with the first link and I agree with the answers final comment "I am not really happy with any of them :)". the second option too seems to only concern itself with turning objects into JSON and no JSON into objects, the answer there makes a customer into `{"Id": 1, "Name": "Foo"}"` but then when the client recieves that string how does it know that its a Customer object and not a `Person` or a `MonsterTruckWithLasers`? my small class aims to turn any object into a string and in that string contain the info required to deserialize it back into a copy of the original object. – Daniel Robinson Feb 27 '14 at 13:47
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/48585/discussion-between-0xor1-and-gunter-zochbauer) – Daniel Robinson Feb 27 '14 at 13:47

2 Answers2

1

I would try s.split(':') and process the resulting list. If you do a lot of such operations you should consider creating benchmarks tests, try different techniques and compare them.

If you would still need this line

 s = i < 3 ? s.substring(idx + 1) : s.substring(idx);

I would avoid creating a new substring in each iteration but instead just keep track of the next position.

Günter Zöchbauer
  • 623,577
  • 216
  • 2,003
  • 1,567
  • the only reason I didnt go with `split(':')` is because I can't guarantee that the data part of the string won't have `:` in it so I would end up haivng to join those bits back up again. nice tip on the keeping track of the next position rather than repeatedly overwriting the original string though. – Daniel Robinson Feb 27 '14 at 12:31
  • I see. You delimiter is then `:x:y:` where `x` is some type information and `y` is the field identifier. You could try regex for this. I heard Dart regex isn't very fast yet but I would consider and benchmar it. – Günter Zöchbauer Feb 27 '14 at 12:34
1

You have to decide how important performance is relative to readability and maintainability of the code.

That said, you should not be cutting off the head of the string repeatedly. That is guaranteed to be inefficient - it'll take time that is quadratic in the number of records in your string, just creating those tail strings.

For parsing each field, you can avoid doing substrings on the length and type fields. For the length field, you can build the number yourself:

int index = ...;
// index points to first digit of length.
int length = 0;
int charCode = source.codeUnitAt(index++);
while (charCode != CHAR_COLON) {
   length = 10 * length + charCode - 0x30;
   charCode = source.codeUnitAt(index++);
}
// index points to the first character of content.

Since lengths are usually small integers (less than 2<<31), this is likely to be more efficient than creating a substring and calling int.parse.

The type field is a single ASCII character, so you could use codeUnitAt to get its ASCII value instead of creating a single-character string (and then your content interpretation lookup will need to switch on character code instead of character string).

For parsing content, you could pass the source string, start index and length instead of creating a substring. Then the boolean parser can also just read the code unit instead of the singleton character string, the string parser can just make the substring, and the number parser will likely have to make a substring too and call double.parse.

It would be convenient if Dart had a double.parseSubstring(source, [int from = 0, int to]) that could parse a substring as a double without creating the substring.

lrn
  • 64,680
  • 7
  • 105
  • 121