3

I'm still new to keras and python, and I'm getting an error I can't seem to understand. The error is:

Traceback (most recent call last):
  File "/Users/N/PycharmProjects/hw2/hw2_1.py", line 35, in <module>
model.fit(trainingInp, trainingOut, epochs=10, batch_size=1)
  File "/Library/Python/2.7/site-packages/keras/models.py", line 893, in fit
initial_epoch=initial_epoch)
  File "/Library/Python/2.7/site-packages/keras/engine/training.py", line 1555, in fit
batch_size=batch_size)
  File "/Library/Python/2.7/site-packages/keras/engine/training.py", line 1409, in _standardize_user_data
exception_prefix='input')
  File "/Library/Python/2.7/site-packages/keras/engine/training.py", line 126, in _standardize_input_data
array = arrays[i]
UnboundLocalError: local variable 'arrays' referenced before assignment

It happens in model.fit(). My model is like so:

model = Sequential()
model.add(Dense(3, activation='sigmoid', input_dim=8))
model.add(Dropout(0.5))
model.add(Dense(10, activation='sigmoid'))

model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['accuracy'])
print trainingInp
print trainingOut
model.fit(trainingInp, trainingOut, epochs=10, batch_size=1)

I print my data to make sure I'm not passing empty data in, and it prints correctly just before going into model.fit().

I'm not quite sure how to fix it as I don't really know what the problem is. It seems like the problem is batch_size, but I thought a batch size of 1 is allowed.

Here is how I get my data. I am guaranteed that the data doesn't have any empty values.

#read and categorize data
data = pandas.read_csv('cars.data.txt', delim_whitespace=True, header=None)

#turn class into an integer
enc = pandas.factorize(data['class'])
data["enc"] = enc[0]


#split the data set and make class into a matrix of outputs
trainingSet, testingSet = train_test_split(data, test_size=0.3)

trainingInp = trainingSet.iloc[:,1:9]
trainingOut = keras.utils.to_categorical(trainingSet['enc'], num_classes=10)

testingInp = testingSet.iloc[:,1:9]
testingOut = keras.utils.to_categorical(testingSet['enc'], num_classes=10)
NatBat
  • 33
  • 1
  • 4
  • Show the code from where you declare your trainingInp and trainingOut variables. Show us the data also if possible. – pissall Oct 29 '17 at 11:21
  • Okay, I've added my variable declarations. Thank you for your help. – NatBat Oct 29 '17 at 11:27
  • Just a hunch, can you try increasing the batch size to the power of 2 or something? Or a batch size of 32? – pissall Oct 29 '17 at 11:31
  • 1
    looking at my variable declarations was actually the answer. I was under the impression that train_test_split returned arrays, but it still returns a pandas data frame. when passing the trainingInp.values, the issue went away. Thank you for your help. I guess I need to pay closer attention to my declarations. – NatBat Oct 29 '17 at 11:35
  • Should've noticed. – pissall Oct 29 '17 at 11:42
  • 1
    Possible duplicate of [UnboundLocalError: local variable … referenced before assignment](https://stackoverflow.com/questions/17097273/unboundlocalerror-local-variable-referenced-before-assignment) – ivan_pozdeev Oct 29 '17 at 12:12
  • @NatBat could you include some sample input in the question (something that could be used as values of `trainingInp` and `trainingOut`)? I'm having a hard time in the pull request -- whatever input I try for a test case, it fails dimension checks. – ivan_pozdeev Nov 07 '17 at 17:43

2 Answers2

8

Looks like a bug in Keras.

In engine/training.py,

elif data.__class__.__name__ == 'DataFrame':
    # test if data is a DataFrame, without pandas installed
    data = data.values

should be

elif data.__class__.__name__ == 'DataFrame':
    # test if data is a DataFrame, without pandas installed
    arrays = data.values

Created a pull request.


Here's how I got it:

UnboundLocalError means that the variable is not defined -- which virtually always is a programming error. The block that the faulty line is a part of does not check any conditions prior to using the variable. So, the code assumes that it must be always defined by this point.

Searching "arrays" up from the faulty line shows that it's being defined in branches of a large if block. So, every branch should end up assigning this variable in the course of its work. And indeed, they all do, except this one. So, the execution having taken this branch is the only way how the variable could have ended up being undefined.

Now, all that leaves is find out what the intended code in that branch should be. Seeing that

  • all other branches end with arrays = <something> by itself, and this line looks just like it, and
  • reassigning data is a pointless operation here:
    • it has different types in different branches (e.g. in one arrays = data and in another arrays = [data])
    • is not reassigned in any of them, so the if block is not supposed to convert it to some common representation - as a result, it's most probably is not used further on

the code's author most likely made a typo, and this is what they must've intended the line to be. Looking up Pandas.DataFrame.values confirms that it is an array of arrays, so a direct assignment of it to something called "arrays" looks legit.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
6

Sometimes you get this error when you pass Pandas series or dataframe to the keras estimator. Simply do this

df_train_x = df_train_x.values
df_test_x = df_test_x.values

Then

estimator.fit(df_train_x , df_train_y)
ivarni
  • 17,658
  • 17
  • 76
  • 92
Jai Janyani
  • 61
  • 1
  • 2