Questions tagged [feature-extraction]

In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction. Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.

Feature extraction involves simplifying the amount of resources required to describe a large set of data accurately. When performing analysis of complex data one of the major problems stems from the number of variables involved. Analysis with a large number of variables generally requires a large amount of memory and computation power or a classification algorithm which overfits the training sample and generalizes poorly to new samples. Feature extraction is a general term for methods of constructing combinations of the variables to get around these problems while still describing the data with sufficient accuracy.

Best results are achieved when an expert constructs a set of application-dependent features. Nevertheless, if no such expert knowledge is available general dimensionality reduction techniques may help.

Source: Wikipedia

1664 questions
82
votes
9 answers

The easiest way for getting feature names after running SelectKBest in Scikit Learn

I'm trying to conduct a supervised machine-learning experiment using the SelectKBest feature of scikit-learn, but I'm not sure how to create a new dataframe after finding the best features: Let's assume I would like to conduct the experiment…
Aviade
  • 2,057
  • 4
  • 27
  • 49
54
votes
2 answers

What is the difference between feature detection and descriptor extraction?

Does anyone know the difference between feature detection and descriptor extraction in OpenCV 2.3? I understand that the latter is required for matching using DescriptorMatcher. If that's the case, what is FeatureDetection used for?
53
votes
5 answers

Feature Selection and Reduction for Text Classification

I am currently working on a project, a simple sentiment analyzer such that there will be 2 and 3 classes in separate cases. I am using a corpus that is pretty rich in the means of unique words (around 200.000). I used bag-of-words method for feature…
clancularius
  • 877
  • 1
  • 9
  • 12
48
votes
2 answers

What is a feature descriptor in image processing (algorithm or description)?

I get often confused with the meaning of the term descriptor in the context of image features. Is a descriptor the description of the local neighborhood of a point (e.g. a float vector), or is a descriptor the algorithm that outputs the description?…
41
votes
5 answers

Linear Regression :: Normalization (Vs) Standardization

I am using Linear regression to predict data. But, I am getting totally contrasting results when I Normalize (Vs) Standardize variables. Normalization = x -xmin/ xmax – xmin   Zero Score Standardization = x - xmean/ xstd   a) Also,…
38
votes
4 answers

Extracting HoG Features using OpenCV

I am trying to extract features using OpenCV's HoG API, however I can't seem to find the API that allow me to do that. What I am trying to do is to extract features using HoG from all my dataset (a set number of positive and negative images), then…
34
votes
7 answers

Issue with OneHotEncoder for categorical features

I want to encode 3 categorical features out of 10 features in my datasets. I use preprocessing from sklearn.preprocessing to do so as the following: from sklearn import preprocessing cat_features = ['color', 'director_name', 'actor_2_name'] enc =…
Medo
  • 952
  • 3
  • 11
  • 22
34
votes
2 answers

Convolutional Neural Network (CNN) for Audio

I have been following the tutorials on DeepLearning.net to learn how to implement a convolutional neural network that extracts features from images. The tutorial are well explained, easy to understand and follow. I want to extend the same CNN to…
33
votes
7 answers

Are there any fast alternatives to SURF and SIFT for scale-invariant feature extraction?

SURF is patented, as is SIFT. ORB and BRIEF are not patented, but their features are not scale-invariant, seriously limiting their usefulness in complex scenarios. Are there any feature extractors that can extract scale-invariant features as fast as…
Diego
  • 5,024
  • 6
  • 38
  • 47
32
votes
2 answers

Which OCR Engine is better: Tesseract or OCRopus?

I have tried Tesseract with iPhone and assessed its accuracy to be 70% without image preprocessing. I also noticed that it might be poor in extracting digits. I have heard about OCRopus OCR engine: which is better, Tesseract or OCRopus, in terms of…
Ahmed Hussein
  • 442
  • 2
  • 6
  • 12
27
votes
3 answers

What does the distance attribute in DMatches mean?

I have a short question: When I do feature-matching in OpenCV, what does the distance attribute mean of DMatches in MatOfMatches? I know that I have to filter matches with bigger distance because they aren't as good as them with lower distance. But…
stetro
  • 547
  • 2
  • 6
  • 11
25
votes
3 answers

scikit-learn TfidfVectorizer meaning?

I was reading about TfidfVectorizer implementation of scikit-learn, i don´t understand what´s the output of the method, for example: new_docs = ['He watches basketball and baseball', 'Julie likes to play basketball', 'Jane loves to play…
21
votes
5 answers

How are HoG features represented graphically?

I'm implementing the Histogram of Oriented Gradient features from "Histograms of oriented gradients for human detection" and I'd like to visualise the result. All papers on these features use a standard visualisation, but I can't find any…
21
votes
4 answers

Why do we maximize variance during Principal Component Analysis?

I'm trying to read through PCA and saw that the objective was to maximize the variance. I don't quite understand why. Any explanation of other related topics would be helpful
karthik A
  • 655
  • 1
  • 11
  • 19
19
votes
2 answers

Getting feature names from within a FeatureUnion + Pipeline

I am using a FeatureUnion to join features found from the title and description of events: union = FeatureUnion( transformer_list=[ # Pipeline for pulling features from the event's title ('title', Pipeline([ ('selector',…
Huey
  • 2,714
  • 6
  • 28
  • 34
1
2 3
99 100