0

When I import scikit-learn before importing tensorflow I don't have any issues. Running this block of code produces an output of 1.7766212763101197e-12.

import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA
import tensorflow as tf

X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))

However, if I import tensorflow before importing scikit-learn my code no longer functions. When I run this code-block

import tensorflow as tf
import numpy as np
np.random.seed(123)
import numpy.random as rand
from sklearn.decomposition import PCA

X = rand.randn(100,15)
X = X - X.mean(axis=0)
mod = PCA()
w = mod.fit_transform(X)
h = mod.components_
print(np.sum(np.abs(X-np.dot(w,h))))

I get an output of 130091393261440.25.

Why is that? My versions for the packages are:

numpy - 1.13.1

sklearn - 0.19.0

tensorflow - 1.3.0

Guillaume Chevalier
  • 9,613
  • 8
  • 51
  • 79
  • I am also not able to reproduce your issue with given code snippet on both python2 and python3. Are you able to duplicate the issue on multiple runs? If yes, post more info about your system. – Vivek Kumar Mar 02 '18 at 07:57
  • Yeah this is a consistent problem. It is also an issue for other sklearn packages (ICA,NMF). I use anaconda. [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux. –  Mar 02 '18 at 15:10
  • Is there a way to somehow save the runs and post it here? –  Mar 02 '18 at 15:11

2 Answers2

0

Import order should not affect output, as python modules are self-contained, except in the case of dependencies.

I was unable to reproduce your error, and get an output of 1.7951539777252834e-12 for both code blocks.

This is an interesting problem and I am curious to see if others can provide a better response for why you are seeing this issue.

0

Note: the present answer is an answer to the title for the ones looking for using TensorFlow within Scikit-Learn, and does not just regards some import errors as you've had.

You can use TensorFlow within Scikit-Learn pipelines using Neuraxle.

Neuraxle is an extension of Scikit-Learn to make it more compatible with all deep learning libraries.

Problem: You can’t Parallelize nor Save Pipelines Using Steps that Can’t be Serialized “as-is” by Joblib (e.g.: a TensorFlow step)

Whereas a step is a transformer or estimator in a scikit-learn Pipeline.

This problem will only surface past some point of using Scikit-Learn. This is the point of no-return: you’ve coded your entire production pipeline, but once you trained it and selected the best model, you realize that what you’ve just coded can’t be serialized.

This means once trained, your pipeline can’t be saved to disks because one of its steps imports things from a weird python library coded in another language and/or uses GPU resources. Your code smells weird and you start panicking over what was a full year of research development.

Solution with Code Examples:

Here is a full project example from A to Z where TensorFlow is used with Neuraxle as if it was used with Scikit-Learn.

Here is another practical example where TensorFlow is used within a scikit-learn-like pipeline

The trick is performed by using Neuraxle-TensorFlow.

This is to make use of Neuraxle's savers.


Read also: https://stackoverflow.com/a/60557192/2476920

Guillaume Chevalier
  • 9,613
  • 8
  • 51
  • 79