12

I'm working on this project based on TensorFlow.

I just want to train an OCR model by attention_ocr based on my own datasets, but I don't know how to store my images and ground truth in the same format as FSNS datasets.

Is there anybody also work on this project or know how to solve this problem?

Zoe
  • 27,060
  • 21
  • 118
  • 148
Jianbo Wang
  • 143
  • 1
  • 2
  • 8
  • were you able to get this working for you? Can you share the script you used to prepare your own data? – Roger Jul 16 '17 at 16:47
  • Sorry, I have not reproduce the process of preparing datasets based on my own images. I got stuck in this problem. https://stackoverflow.com/questions/45093932/invalidargumenterror-when-traing-attention-ocr-assign-requires-shapes-of-both – Jianbo Wang Jul 17 '17 at 07:40
  • 1
    Your project link is dead. – zabop Feb 22 '19 at 11:02
  • @JianboWang can you please share the dummy datasets of FSNS which u created. – Dexter Apr 19 '20 at 09:11
  • @JianboWang Do you have a tutorial to create the FSNS dataset? – isaac.af95 Aug 23 '21 at 23:01

2 Answers2

21

The data format for storing training/test is defined in the FSNS paper https://arxiv.org/pdf/1702.03970.pdf (Table 4).

To store tfrecord files with tf.Example protos you can use tf.python_io.TFRecordWriter. There is a nice tutorial, an existing answer on the stackoverflow and a short gist.

Assume you have an numpy ndarray img which has num_of_views images stored side-by-side (see Fig. 3 in the paper): enter image description here and a corresponding text in a variable text. You will need to define some function to convert a unicode string into a list of character ids padded to a fixed length and unpadded as well. For example:

char_ids_padded, char_ids_unpadded = encode_utf8_string(
   text='abc', 
   charset={'a':0, 'b':1, 'c':2},
   length=5,
   null_char_id=3)

the result should be:

char_ids_padded = [0,1,2,3,3]
char_ids_unpadded = [0,1,2]

If you use functions _int64_feature and _bytes_feature defined in the gist you can create a FSNS compatible tf.Example proto using a following snippet:

char_ids_padded, char_ids_unpadded = encode_utf8_string(
   text, charset, length, null_char_id)
example = tf.train.Example(features=tf.train.Features(
  feature={
    'image/format': _bytes_feature("PNG"),
    'image/encoded': _bytes_feature(img.tostring()),
    'image/class': _int64_feature(char_ids_padded),
    'image/unpadded_class': _int64_feature(char_ids_unpadded),
    'height': _int64_feature(img.shape[0]),
    'width': _int64_feature(img.shape[1]),
    'orig_width': _int64_feature(img.shape[1]/num_of_views),
    'image/text': _bytes_feature(text)
  }
))
Alexander Gorban
  • 1,238
  • 1
  • 11
  • 17
  • Hi, Gorban, thx a lot for your response. There are another two questions: 1.Does it works on Chinese datasets? 2.If in one image, there are several groundtruths(more text segments), does it also works? – Jianbo Wang Jun 11 '17 at 08:16
  • 1. It works with any language 2. It doesn't work for multiple unrelated text regions – Alexander Gorban Jun 26 '17 at 21:46
0

You should not use the below code directly:

"'image/encoded': _bytes_feature(img.tostring()),"

In my code, I wrote this:

_,jpegVector = cv2.imencode('.jpeg',img)
imgStr = jpegVector.tostring()
'image/encoded': _bytes_feature(imgStr)
Zoe
  • 27,060
  • 21
  • 118
  • 148
shouhuxianjian
  • 129
  • 1
  • 4