0

I'm currently using an instance of Ubuntu 18.04.5 LTS using a google cloud service.

There's some code that I'm trying to use and for some reason, tensorflow won't use all of the instance's eight CPUs.

The purpose of this code was to train an autoencoder that benchmarks at around 10 minutes per epoch on a host system.

Running the training results in the program being very slow and sometimes crashing the instance when it is being used on the cloud.

I tried to change environment variables and reconfigure the session. Other values (1,2,8) were also used to define the threading-related variables, but it didn't work.

I was hoping that adjusting some of the settings would help. I wonder if there are any suggestions or maybe any posts that I wasn't able to find during my initial search.

import os
import tensorflow as tf
os.environ['THEANO_FLAGS'] = 'optimizer=None'
os.environ['MKL_NUM_THREADS'] = '16'
os.environ['GOTO_NUM_THREADS'] = '16'
os.environ['OMP_NUM_THREADS']= '16'
os.environ['openmp'] = 'True'
from pathlib import Path
import glob
import sys

import theano; 
theano.config.blas.ldflags

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

from sklearn.metrics import mean_squared_error
from math import sqrt
from PIL import Image

import matplotlib.pyplot as plt
from skimage.io import imread, imshow, imsave
from skimage.filters import threshold_mean, threshold_minimum, threshold_otsu, threshold_local
from keras.preprocessing.image import load_img, array_to_img, img_to_array
from keras.models import Sequential, Model
from keras.layers import Dense, Conv2D, MaxPooling2D, UpSampling2D, Flatten, Input
from keras.optimizers import SGD, Adam, Adadelta, Adagrad
from keras import backend as K
from sklearn.model_selection import train_test_split
np.random.seed(111)

from keras import models

config = tf.ConfigProto()
config.inter_op_parallelism_threads = 16
config.intra_op_parallelism_threads = 16
tf.Session(config=config)
ytho
  • 1
  • Are you sure there is no GPU that `tensorflow` binds to? Which TensorFlow version do you have? How do you define model and run it? Do you mean 8 cpus or 8 cores ? Have you seen this possibly related question: https://stackoverflow.com/questions/45985641/can-tensorflow-run-with-multiple-cpus-no-gpus ? Please provide more information about model architecture and training procedure and update your question – Proko Jan 19 '21 at 21:18
  • How exactly are you concluding that it is not running on multiple cores/CPUs? – Dr. Snoopy Jan 19 '21 at 22:45

0 Answers0