Questions tagged [dimension-reduction]

38 questions
55
votes
7 answers

Mapping N-dimensional value to a point on Hilbert curve

I have a huge set of N-dimensional points (tens of millions; N is close to 100). I need to map these points to a single dimension while preserving spatial locality. I want to use Hilbert space-filling curve to do it. For each point I want to pick…
Alexander Gladysh
  • 39,865
  • 32
  • 103
  • 160
6
votes
1 answer

In natural language processing (NLP), how do you make an efficient dimension reduction?

In NLP, it's always the case that the dimension of the features are very huge. For example, for one project at hand, the dimension of features is almost 20 thousands (p = 20,000), and each feature is a 0-1 integer to show whether a specific word or…
4
votes
1 answer

Why scikit-learn truncatedSVD uses 'randomized' algorithm as default?

I used with truncatedSVD with 30000 by 40000 size of term-document matrix to reducing the dimension to 3000 dimension, when using 'randomized', variance ratio is about 0.5 (n_iter=10) when using 'arpack', variance ratio is about 0.9 Variance ratio…
Kyeongpil
  • 43
  • 8
4
votes
3 answers

Dimension Reduction

I'm trying to reduce a high-dimension dataset to 2-D. However, I don't have access to the whole dataset upfront. So, I'd like to generate a function that takes an N-dimensional vector and returns a 2-dimensional vector, such that if I give it to…
PlexLuthor
  • 578
  • 2
  • 8
  • 16
3
votes
2 answers

Confirmatory Factor Analysis in Python

Is there a package to perform Confirmatory Factor Analysis in python? I have found a few that can perform Exploratory Factor Analysis in python (scikitlearn, factor_analyzer etc), but I am yet to find a package that does CFA .
3
votes
1 answer

how to reduce dimensionality of vector

I have a set of vectors. I'm working on ways to reduce a n-dimensional vector to a unary value (1-d), say (x1,x2,....,xn) ------> y This single value needs to be the characteristic value of the vector. Each unique vector produces a unique output…
marc
  • 949
  • 14
  • 33
2
votes
1 answer

Global operator along a single dimension in Keras?

Let's say I have a dataset comprising greyscale videos. The length and size of each video can vary so I am representing the data in three dimensions via the following shape: Batch size time y x channels None None None None 1 I want to…
Anthony
  • 341
  • 2
  • 11
2
votes
1 answer

Optimal perplexity for t-SNE with using larger datasets (>300k data points)

I am using t-SNE to make a 2D projection for visualization from a higher dimensional dataset (in this case 30-dims) and I have a question about the perplexity hyperparameter. It's been a while since I used t-SNE and had previously only used it on…
2
votes
0 answers

The Curse of high Dimension And Distance

For extracting features from video frames (2 sample/sec) I use keras framework in python and load VGG16 that input size is (150,150,3) and output size is (4,4,512). After the feature extraction step I want to cluster frame features with Hierarchical…
2
votes
1 answer

Using features without applying PCA

Suppose there are 8 features in the dataset. I use PCA to and find out that 99% of the information is in the first 3 features using the cumulative sum of the explained variance ratio. Then why do I need to fit and transform these 3 features using…
2
votes
2 answers

Mahout binary data clustering

I have points with binary features: id, feature 1, feature 2, .... 1, 0, 1, 0, 1, ... 2, 1, 1, 0, 1, ... and the size of matrix is about 20k * 200k but it is sparse. I am using Mahout for clustering data by kmeans algorithm and have the following…
1
vote
0 answers

I want to input 3d array(custom data) to sklearn-PCA function

I'm trying to input custom data(MIDI vector) into the PCA function of sklearn library. Below is the current shape of my data. data [[[ 4. 56. ] # [rhythm1 melody1] [ 2. 56. ] # [rhythm2 melody2] [ 2. 55. ] # [rhythm3 melody3] […
1
vote
0 answers

How to use a function that changes during training with keras

I tried to customize my loss function in my auto encoder, the loss function must take into account the result of another dimension reduction (LLE) and the data I pass to the function must be updated to each calculates the loss function, the…
1
vote
0 answers

Dimensional reduction through subspace clustering

I am trying to write a framework in Python to compare different Dimensional-Reduction-Algorithms and I'm looking for a tutorial or implementation which uses subspace clustering Algorithms such as TSC, SSC, SSC-OMP for this goal. There is some Code…
1
vote
0 answers

Dimension reduction using PCA

Suppose I have a $n \times p$ data matrix $X$, $p>>n$. To reduce the dimension of the data, I use principal component analysis as follows: I perform SVD and find matrices U ($n \times r$) and V ($r \times p$) such that $X=UDV$, where $D$ is a…
1
2 3