My goal is to lift a few values from a text file and generate a plot using matplotlib...
I have several large (~100MB) text log files generated from a python script that is calling tensorflow. I save the terminal output from running the script like this:
python my_script.py 2>&1 | tee mylog.txt
Here's a snippet from the text file that I'm trying to parse and turn into a dictionary:
Epoch 00001: saving model to /root/data-cache/data/tmp/models/ota-cfo-full_20200626-173916_01_0.05056382_0.99.h5
5938/5938 [==============================] - 4312s 726ms/step - loss: 0.1190 - accuracy: 0.9583 - val_loss: 0.0506 - val_accuracy: 0.9854
I'm specifically trying to lift epoch number (0001), the time in seconds (4312), loss (0.1190), accuracy (0.9538), val_loss (0.0506) and val_accuracy for 100 epochs so I can make a plot using matplotlib.
The log file is full of other junk that I don't want like:
Epoch 1/100
1/5938 [..............................] - ETA: 0s - loss: 1.7893 - accuracy: 0.31252020-06-26 17:39:45.253972: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1479] CUPTI activity buffer flushed
2020-06-26 17:39:45.255588: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:216] GpuTracer has collected 179 callback api events and 179 activity events.
2020-06-26 17:39:45.276306: I tensorflow/core/profiler/rpc/client/save_profile.cc:168] Creating directory: /root/data-cache/data/tmp/20200626-173933/train/plugins/profile/2020_06_26_17_39_45
2020-06-26 17:39:45.284235: I tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to /root/data-cache/data/tmp/20200626-173933/train/plugins/profile/2020_06_26_17_39_45/ddfc870f32d1.trace.json.gz
2020-06-26 17:39:45.286639: I tensorflow/core/profiler/utils/event_span.cc:288] Generation of step-events took 0.049 ms
2020-06-26 17:39:45.288257: I tensorflow/python/profiler/internal/profiler_wrapper.cc:87] Creating directory: /root/data-cache/data/tmp/20200626-173933/train/plugins/profile/2020_06_26_17_39_45Dumped tool data for overview_page.pb to /root/data-cache/data/tmp/20200626-173933/train/plugins/profile/2020_06_26_17_39_45/ddfc870f32d1.overview_page.pb
Dumped tool data for input_pipeline.pb to /root/data-cache/data/tmp/20200626-173933/train/plugins/profile/2020_06_26_17_39_45/ddfc870f32d1.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to /root/data-cache/data/tmp/20200626-173933/train/plugins/profile/2020_06_26_17_39_45/ddfc870f32d1.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to /root/data-cache/data/tmp/20200626-173933/train/plugins/profile/2020_06_26_17_39_45/ddfc870f32d1.kernel_stats.pb
2/5938 [..............................] - ETA: 6:24 - loss: 1.7824 - accuracy: 0.2656
3/5938 [..............................] - ETA: 17:03 - loss: 1.7562 - accuracy: 0.2396
4/5938 [..............................] - ETA: 22:27 - loss: 1.7368 - accuracy: 0.2344
5/5938 [..............................] - ETA: 22:55 - loss: 1.7387 - accuracy: 0.2375
6/5938 [..............................] - ETA: 24:16 - loss: 1.7175 - accuracy: 0.2656
7/5938 [..............................] - ETA: 24:34 - loss: 1.6885 - accuracy: 0.2812
Epoch 54/100
1500/1500 [==============================] - ETA: 0s - loss: 0.0088 - accuracy: 0.9984
Epoch 00054: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k-clean_20200701-205945_54_0.0054215402_1.00.h5
1500/1500 [==============================] - 942s 628ms/step - loss: 0.0088 - accuracy: 0.9984 - val_loss: 0.0054 - val_accuracy: 0.9993
2020-07-02 14:42:29.102025: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 674 of 1000
2020-07-02 14:42:33.511163: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.
Epoch 55/100
1500/1500 [==============================] - ETA: 0s - loss: 0.0136 - accuracy: 0.9978
Epoch 00055: saving model to /root/data-cache/data/tmp/models/ota-cfo-10k-clean_20200701-205945_55_0.0036424326_1.00.h5
1500/1500 [==============================] - 948s 632ms/step - loss: 0.0136 - accuracy: 0.9978 - val_loss: 0.0036 - val_accuracy: 0.9990
2020-07-02 14:58:32.042963: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:184] Filling up shuffle buffer (this may take a while): 690 of 1000
2020-07-02 14:58:36.302518: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:233] Shuffle buffer filled.
This almost works but doesn't get the epoch number:
regular_exp = re.compile(r'(?P<loss>\d+\.\d+)\s+-\s+accuracy:\s*(?P<accuracy>\d+\.\d+)\s+-\s+val_loss:\s*(?P<val_loss>\d+\.\d+)\s*-\s*val_accuracy:\s*(?P<val_accuracy>\d+\.\d+)', re.M)
with open(log_file, 'r') as file:
results = [ match.groupdict() for match in regular_exp.finditer(file.read()) ]
I've also tried just reading the file in but it has these weird x08's everywhere.
from pprint import pprint as pp
log_file = 'mylog.txt'
text_file = open(log_file, "r")
lines = text_file.readlines()
pp (lines)
' 291/1500 [====>.........................] - ETA: 12:19 - loss: 0.7179 - '
'accuracy: '
'0.7163\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\n',
' 292/1500 [====>.........................] - ETA: 12:18 - loss: 0.7164 - '
'accuracy: '
'0.7168\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\n',
Can someone help me construct a regular expression in python that will let me make a dictionary of these values?
My goal is something like this:
[{'iteration': '00', 'seconds': '1802', 'loss': '0.3430', 'accuracy': '0.8753', 'val_loss': '0.1110', 'val_accuracy': '0.9670', 'epoch_num': '00002', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_02_0.069291627_0.98.h5'}, {'iteration': '1500/1500', 'seconds': '1679', 'loss': '0.0849', 'accuracy': '0.9739', 'val_loss': '0.0693', 'val_accuracy': '0.9807', 'epoch_num': '00003', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_03_0.055876694_0.98.h5'}, {'iteration': '1500/1500', 'seconds': '1674', 'loss': '0.0742', 'accuracy': '0.9791', 'val_loss': '0.0559', 'val_accuracy': '0.9845', 'epoch_num': '00004', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_04_0.053867317_0.99.h5'}, {'iteration': '1500/1500', 'seconds': '1671', 'loss': '0.0565', 'accuracy': '0.9841', 'val_loss': '0.0539', 'val_accuracy': '0.9850', 'epoch_num': '00005', 'epoch_file': '/root/data-cache/data/tmp/models/ota-cfo-10k_20200527-001913_05_0.053266536_0.99.h5'}]