2

I have tfrecord file that is about 8 G. I want to split it into 4 files, each file about 2 G. How can I do this directly? Can I do this in tensorflow? Is there any application to split tfrecord data?

  • This looks like a potential duplicate of [Split .tfrecords file into many .tfrecords files](https://stackoverflow.com/questions/54519309/split-tfrecords-file-into-many-tfrecords-files) – xdhmoore Jan 22 '21 at 02:07

1 Answers1

0

I don't know of a way to specify the resulting size of a tfrecord file. However, you can certainly limit the number of the features inside the tfrecord files. Knowing this is not exactly what you're asking for, it gets the job done similarly.

Here's example code how I dealt with this situation in the past (see full code here):

(fragment_size is the number of features in one tfrecord file)

for video_count in range((num_videos)):

    if video_count % fragment_size == 0:
        if writer is not None:
            writer.close()
            filename = os.path.join(destination_path, name + str(
                current_batch_number) + '_of_' + str(
                total_batch_number) + '.tfrecords')
            print('Writing', filename)
            writer = tf.python_io.TFRecordWriter(filename)

        for image_count in range(num_images):
            path = 'blob' + '/' + str(image_count)
            image = data[video_count, image_count, :, :, :]
            image = image.astype(color_depth)
            image_raw = image.tostring()

            feature[path] = _bytes_feature(image_raw)
            feature['height'] = _int64_feature(height)
            feature['width'] = _int64_feature(width)
            feature['depth'] = _int64_feature(num_channels)

        example = tf.train.Example(features=tf.train.Features(feature=feature))
        writer.write(example.SerializeToString())
if writer is not None:
    writer.close()
whiletrue
  • 10,500
  • 6
  • 27
  • 47