0

I am working on a custom dataset to train and test the models attention_ocr and street, but am unclear on what the function encode_utf8_string described here is doing. What is the purpose of the padding and use of nulls in determining the char arrays (padded and unpadded)?

Given the following charset, length (5), and null char (3):

{'a':0, 'b':1, 'c':2}, 

Are these the correct padded and unpadded results (note spaces in text)?

'bc': padded: [1,2,3,3,3], unpadded: [1,2]
'a a': padded: [0,3,0,3,3], unpadded: [0,3,0]
Roger
  • 2,063
  • 4
  • 32
  • 65
  • Do you want to convert your dataset to FSNS dataset format?. In this paper `null` is `133` and `space` is `0`. Check: https://github.com/tensorflow/models/blob/master/attention_ocr/python/datasets/testdata/fsns/charset_size%3D134.txt – Vijay Mariappan Jul 16 '17 at 17:20
  • Yes, I have images (each in folders by text/label) and a file with all the expected text from the images. I am looking to combine those exactly as the FSNS dataset was created. I see the space now (0), but not sure the use of null, or how padding should be done. Why is there a padded and unpadded label (class)? – Roger Jul 16 '17 at 18:47

0 Answers0