Chars, padding and nulls for custom dataset

Asked Jul 16 '17 at 16:27

Active Jul 16 '17 at 16:59

Viewed 120 times

I am working on a custom dataset to train and test the models attention_ocr and street, but am unclear on what the function encode_utf8_string described here is doing. What is the purpose of the padding and use of nulls in determining the char arrays (padded and unpadded)?

Given the following charset, length (5), and null char (3):

{'a':0, 'b':1, 'c':2},

Are these the correct padded and unpadded results (note spaces in text)?

'bc': padded: [1,2,3,3,3], unpadded: [1,2]
'a a': padded: [0,3,0,3,3], unpadded: [0,3,0]

edited Jul 16 '17 at 16:59

asked Jul 16 '17 at 16:27

Roger

2,063
4
32
65

Do you want to convert your dataset to FSNS dataset format?. In this paper `null` is `133` and `space` is `0`. Check: https://github.com/tensorflow/models/blob/master/attention_ocr/python/datasets/testdata/fsns/charset_size%3D134.txt – Vijay Mariappan Jul 16 '17 at 17:20
Yes, I have images (each in folders by text/label) and a file with all the expected text from the images. I am looking to combine those exactly as the FSNS dataset was created. I see the space now (0), but not sure the use of null, or how padding should be done. Why is there a padded and unpadded label (class)? – Roger Jul 16 '17 at 18:47

Chars, padding and nulls for custom dataset

0 Answers0