-2
  1. I am a bit confused about what exactly every function within this pipeline does. Can someone explain how this pipeline works? I know roughly how, but some clarification would be immensely helpful.

  2. Why is capital 'X' used in def transform(self, X)?

  3. What's the point of get_feature_names and __init__ specifically?

Code:

class custom_fico(BaseEstimator,TransformerMixin):
    
        def __init__(self):
            self.feature_names = ['fico']
    
        def fit(self,x,y=None):
            return self
    
        def transform(self,X):
            k = X['FICO.Range'].str.split('-',expand = True).astype(float)
            fico = 0.5 * (k[0] + k[1])
            return pd.DataFrame({'fico':fico})
    
        def get_feature_names(self):
            return self.feature_names
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • 1
    [Check this for __init__](https://www.geeksforgeeks.org/__init__-in-python/). Also, it can be any character unless it is an invalid character –  Jun 18 '21 at 03:25
  • 1
    `X` is used for the matrix of training data, the name is used by convention in Machine Learning but not really anywhere else. `get_feature_names()` is probably used by whatever code uses this custom_fico object. – Jan Wilamowski Jun 18 '21 at 03:27
  • But why did we use 'capital X' in def transform(self,X) and use 'small x' in def fit(self,x,y=None). and '__init__' is a constructor right? – pratik kandalgaonkar Jun 18 '21 at 03:41
  • 1
    `x` vs `X` is inconsistent and I don't know what the purpose of the fit() method is since it doesn't actually fit the data and the parameters are unused. The code seems incomplete but again, it depends on where this object is actually used. – Jan Wilamowski Jun 18 '21 at 04:03
  • 1. actually this code is one part of the entire big pipeline for datapreprocessing, we use this sub-pipeline to seperate values of the column 'FICO.Range' which has values in ranges, i.e 700-710, 720-726. what we do with these values is we split them at ' - ' and then take their average, and create a new column names 'fico' and put the single average values, i.e 705, 723 in this column 'fico'. – pratik kandalgaonkar Jun 18 '21 at 06:23
  • 2. the point of fit() method is nothing actually, we dnt have to save average values to be used on other column, we just have to create average for the range that comes and convert it to number, that's why the fit method is kept empty in this sense. – pratik kandalgaonkar Jun 18 '21 at 06:26
  • Questions on SO should be limited in scope. (1) is too broad itself probably. (2) is (essentially) asked already [here](https://stackoverflow.com/q/52237376/10495893). The second half of (3) is asked [here](https://stackoverflow.com/q/625083/10495893) already. – Ben Reiniger Jun 18 '21 at 18:14

1 Answers1

1

1- try this link. Very helpful to understand everything and it makes everything clear. https://medium.com/@shivangisareen/pipelining-in-python-7edd2382f67d

2- I think that its not necessary to use capital X. I think you can use anything else and still work but in this case the code writer just chose capital x.

3- and lastly, the init method is similar to constructors in C++ and Java . Constructors are used to initialize the object's state. The task of constructors is to initialize(assign values) to the data members of the class when an object of class is created.

If you need any further help, the community is here for you!