Tensorflow 2.1 - make_csv_dataset - ValueError: Received a feature column from TensorFlow v1, but this is a TensorFlow v2 Estimator

Question

I'm having a hard time figuring out what's going on here (as I see most people are with trying to figure out TF 2.1). Below is my problem and a few of the solutions I've already tried with code examples.

I'm trying to use AdaNet to start a TensorFlow Estimator training session, by creating a tf.data.Dataset from an imported .csv file. I'm running:

Python 3.6
Windows 10
tensorflow==2.1.0
pandas==0.25.1
numpy==1.16.5

This error...:

ValueError: Received a feature column from TensorFlow v1, but this is a TensorFlow v2 Estimator. Please either use v2 feature columns (accessible via tf.feature_column.* in TF 2.x) with this Estimator, or switch to a v1 Estimator for use with v1 feature columns (accessible via tf.compat.v1.estimator.* and tf.compat.v1.feature_column.*, respectively.

...is produced by this code (commented nicely. I'm posting it all because I really don't know what part is giving me this error. And yes, getting the list of column names I want to use at each step is annoying, but that's how I'm keeping it for now):

import numpy as np
import pandas as pd
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
import warnings
warnings.filterwarnings("once")
import adanet
import tensorflow as tf
from tensorflow.estimator import BinaryClassHead, MultiClassHead


# This will be a binary classification problem
head = BinaryClassHead()


# Import the dataset we're going to train with, just to get a list of the column names
# we want our estimator to reference
df = pd.read_csv('./datasets/call_restored_df_' + str(4) + '_' + 'SPY' + '.csv')
df = df.set_index(['Date'])
df['class'] = df['class'].astype('int32')

# Create a list of all the column names
feature_columns = list(df.columns)

# Remove the columns we aren't going to use during training
feature_columns.remove('Ticker')
feature_columns.remove('DailyChange')
feature_columns.remove('DailyHighChange')
feature_columns.remove('DailyLowChange')


# Adanet estimator
# Learn to ensemble linear and DNN models.
estimator = adanet.AutoEnsembleEstimator(
    head=head,
    candidate_pool=lambda config: {
        "linear":
            tf.estimator.LinearEstimator(
                head=head,
                feature_columns=feature_columns,
                config=config,
                optimizer='Adagrad'),
        "dnn":
            tf.estimator.DNNEstimator(
                head=head,
                feature_columns=feature_columns,
                config=config,
                optimizer='Adagrad',
                hidden_units=[1000, 500, 100])},
    max_iteration_steps=50)


# Input builders
# Define our train function called by the estimator during training to return
# a tf.data.Dataset (x, y) tuple
def input_fn_train():
    # Do the same thing to collect a list of usable column names
    df = pd.read_csv('./datasets/call_restored_df_' + str(4) + '_' + 'SPY' + '.csv')
    df = df.set_index(['Date'])
    df['class'] = df['class'].astype('int32')
    feature_columns_list = list(df.columns)
    feature_columns_list.remove('Ticker')
    feature_columns_list.remove('DailyChange')
    feature_columns_list.remove('DailyHighChange')
    feature_columns_list.remove('DailyLowChange')

    # Make our tf.data.Dataset from the same .csv file as before
    df = tf.data.experimental.make_csv_dataset(
      './datasets/call_restored_df_' + str(4) + '_' + 'SPY' + '.csv',
      batch_size=32,
      label_name="class",
      select_columns=feature_columns_list)

    df_batches = (
      df.cache().repeat().shuffle(500)
      .prefetch(tf.data.experimental.AUTOTUNE))
    return df_batches

# Get the estimator to train ...
estimator.train(input_fn=input_fn_train, steps=100)

So given that error, I replaced every instance of tf. with tf.compat.v1. in the above code, and got this error:

ValueError: Items of feature_columns must be a _FeatureColumn. Given (type <class 'str'>): Close_Resistance.

Doing some more searching, I discovered that each column has to be labelled as a numeric column type for some reason, so I then implemented these two loops to convert my two lists of column names to numeric type (after reverting back to tf. instead of tf.compat.v1.):

...
feature_columns.remove('DailyLowChange')


# Make all the feature columns numeric type for TF 2.1 for some reason
new_feature_list = []
for i in feature_columns:
    new_feature_list.append(tf.feature_column.numeric_column(i))

# Adanet estimator
# Learn to ensemble linear and DNN models.
estimator = adanet.AutoEnsembleEstimator(
...

and

...
feature_columns_list.remove('DailyLowChange')

# Make all the feature columns numeric type for TF 2.1 for some reason
new_feature_columns_list = []
for i in feature_columns_list:
    new_feature_columns_list.append(tf.feature_column.numeric_column(i))

# Make our tf.data.Dataset from the same .csv file as before
df = tf.data.experimental.make_csv_dataset(
...

...and now get this error:

TypeError: not all arguments converted during string formatting

So I'm at a loss of what to do. I want to use TF 2.1 to get this thing working, but I am frustrated with failure. I see at this post, there was a solution, but my .csv file has too many column names to individually go through one at a time and define each as a numeric type, so I need it to be dynamic no matter how many columns are being loaded. Someone help! Thanks.

Tensorflow 2.1 - make_csv_dataset - ValueError: Received a feature column from TensorFlow v1, but this is a TensorFlow v2 Estimator

0 Answers0