Get CSV from Tensorflow summaries

Question

I have some very large tensorflow summaries. If these are plotted using tensorboard, I can download CSV files from them.

However, plotting these using tensorboard would take a very long time. I found in the docs that there is a method for reading the summary directly in Python. This method is summary_iterator and can be used as follows:

import tensorflow as tf

for e in tf.train.summary_iterator(path to events file):
    print(e)

Can I use this method to create CSV files directly? If so, how can I do this? This would save a lot of time.

score 2 · Answer 1 · edited Jun 27 '19 at 20:11

One possible way of doing it would be like this:

from tensorboard.backend.event_processing import event_accumulator      
import numpy as np
import pandas as pd
import sys

def create_csv(inpath, outpath):
    sg = {event_accumulator.COMPRESSED_HISTOGRAMS: 1,
          event_accumulator.IMAGES: 1,
          event_accumulator.AUDIO: 1,
          event_accumulator.SCALARS: 0,
          event_accumulator.HISTOGRAMS: 1}
    ea = event_accumulator.EventAccumulator(inpath, size_guidance=sg)
    ea.Reload()
    scalar_tags = ea.Tags()['scalars']
    df = pd.DataFrame(columns=scalar_tags)
    for tag in scalar_tags:
        events = ea.Scalars(tag)
        scalars = np.array(map(lambda x: x.value, events))
        df.loc[:, tag] = scalars
    df.to_csv(outpath)

if __name__ == '__main__':
    args = sys.argv
    inpath = args[1]
    outpath = args[2]
    create_csv(inpath, outpath)

Please note, this code will load the entire event file into memory, so best to run this on a cluster. For information about the sg argument of the EventAccumulator, see this SO question.

An additional improvement might be to not only store the value of each scalar, but also the step.

Note The code snippet was updated for recent versions of TF. For TF < 1.1 use the following import instead:

from tensorflow.tensorboard.backend.event_processing import event_accumulator as eva

great idea! btw as of TF 1.1 the package is `tensorboard.backend.event_processing import event_accumulator ` . I took the liberty to update your code accordingly — WestCoastProjects, Jun 27 '19 at 20:11

Get CSV from Tensorflow summaries

1 Answers1