There are pre-made tools for that, look for Tensorflow models repository.
Their approach in essence is:
- Parse the xml annotation files and flatten the data structure within them.
- Produce
tfrecord
that combines annotation and images,
this is arguably the best way.
For sake of training you can implement your own converter that takes a pair (xml
,image
) and saves into tfrecord example
.
Tfrecord is tensorflow format for storing data, every tfrecord file is bascially a list containing examples
, every example
is an object that holds data in key : value
pairs, where value is an array of primitive types (int, string, float) and key
is a string.
So, first you flatten your xml
annotation to match constraints of tfrecord
file then you use tensorflow TFRecordWriter to save data into file.
Check Tensorflow API - it will pay off.