In the documentation for the PCA function in scikitlearn, there is a copy
argument that is True
by default.
The documentation says this about the argument:
If False, data passed to fit are overwritten and running fit(X).transform(X) will not yield the expected results, use fit_transform(X) instead.
I'm not sure what this is saying, however, because how would the function overwrite the input X
? When you call .fit(X)
, the function should just be calculating the PCA vectors and updating the internal state of the PCA object, right?
So even if you set copy to False
, the .fit(X)
function should still be returning the object self as it says in the documentation, so shouldn't fit(X).transform(X)
still work?
So what is it copying when this argument is set to False
?
Additionally, when would I want to set it to False
?
Edit: I ran the fit and transform function together and separately and got different results even though the copy parameter was the same for both.
from sklearn.decomposition import PCA
import numpy as np
X = np.arange(20).reshape((5,4))
print("Separate")
XT = X.copy()
pcaT = PCA(n_components=2, copy=True)
print("Original: ", XT)
results = pcaT.fit(XT).transform(XT)
print("New: ", XT)
print("Results: ", results)
print("\nCombined")
XF = X.copy()
pcaF = PCA(n_components=2, copy=True)
print("Original: ", XF)
results = pcaF.fit_transform(XF)
print("New: ", XF)
print("Results: ", results)
########## Results
Separate
Original: [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
New: [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
Results: [[ 1.60000000e+01 -2.66453526e-15]
[ 8.00000000e+00 -1.33226763e-15]
[ 0.00000000e+00 0.00000000e+00]
[ -8.00000000e+00 1.33226763e-15]
[ -1.60000000e+01 2.66453526e-15]]
Combined
Original: [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
New: [[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]]
Results: [[ 1.60000000e+01 1.44100598e-15]
[ 8.00000000e+00 -4.80335326e-16]
[ -0.00000000e+00 0.00000000e+00]
[ -8.00000000e+00 4.80335326e-16]
[ -1.60000000e+01 9.60670651e-16]]