Matlab Toolbox for Dimensionality Reduction (v0.7 - Nov 2008)

The Matlab Toolbox for Dimensionality Reduction contains Matlab implementations of 32 techniques for dimensionality reduction. A large number of implementations was developed from scratch, whereas other implementations are improved versions of software that was already available on the Web. The implementations in the toolbox are conservative in their use of memory. The toolbox is available for download here.


Currently, the Matlab Toolbox for Dimensionality Reduction contains the following techniques:

  1. Principal Component Analysis (PCA)

  2. Probabilistic PCA

  3. Factor Analysis (FA)

  4. Sammon mapping

  5. Linear Discriminant Analysis (LDA)

  6. Multidimensional scaling (MDS)

  7. Isomap

  8. Landmark Isomap

  9. Local Linear Embedding (LLE)

  10. Laplacian Eigenmaps

  11. Hessian LLE

  12. Local Tangent Space Alignment (LTSA)

  13. Conformal Eigenmaps (extension of LLE)

  14. Maximum Variance Unfolding (extension of LLE)

  15. Landmark MVU (LandmarkMVU)

  16. Fast Maximum Variance Unfolding (FastMVU)

  17. Kernel PCA

  18. Generalized Discriminant Analysis (GDA)

  19. Diffusion maps

  20. Stochastic Neighbor Embedding (SNE)

  21. Symmetric SNE (SymSNE)

  22. new: t-Distributed Stochastic Neighbor Embedding (t-SNE)

  23. Neighborhood Preserving Embedding (NPE)

  24. Locality Preserving Projection (LPP)

  25. Linear Local Tangent Space Alignment (LLTSA)

  26. Stochastic Proximity Embedding (SPE)

  27. Multilayer autoencoders (training by RBM + backpropagation or by an evolutionary algorithm)

  28. Local Linear Coordination (LLC)

  29. Manifold charting

  30. Coordinated Factor Analysis (CFA)

  31. new: Gaussian Process Latent Variable Model (GPLVM)


In addition to the techniques for dimensionality reduction, the toolbox contains implementations of 6 techniques for intrinsic dimensionality estimation, as well as functions for out-of-sample extension, prewhitening of data, and the generation of toy datasets.


The toolbox provides easy access to all these implementations. Basically, the only command you need to execute is:


      mapped_data = compute_mapping(data, method, # of dimensions, parameters)


The function assumes the dimensions are the columns in the data, and the instances are the rows. The function also accepts PRTools datasets. Information on how parameters for certain techniques should be specified can be obtained by typing HELP COMPUTE_MAPPING in the Matlab prompt. For more instructions on how to install and use the toolbox, please read the Readme.txt file.
You are free to use, modify, or redistribute this software in any way you want, but only for non-commercial purposes. The use of the toolbox is at your own risk; the author is not responsible for any damage as a result from errors in the software. I would appreciate it if you refer to the toolbox or its author in your papers.

=== Proceed to the download page ===


Information on the toolbox

For more information on the toolbox, we refer to the following publications:

  1. L.J.P. van der Maaten, E.O. Postma, and H.J. van den Herik. Dimensionality Reduction: A Comparative Review. Tilburg University Technical Report, TiCC-TR 2009-005, 2009. [ PDF ]

  2. L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9(Nov):2579-2605, 2008. [ PDF ] [ Supplemental Material (24MB) ]


Known issues

  1. There is a bug in the computation of the Gaussian kernel in the diffusion maps implementation. (thanks to He Li for reporting this bug)

  2. There is a small bug in the implementations of LPP, NPE, and LLTSA. It tries to select the wrong eigenvectors, which leads to an error. In the line where Matlab crashes, please replace ind(2:no_dims + 1) into ind(1:no_dims) to fix this bug! (thanks to Nicolas Rey for reporting this bug)

  3. The out-of-sample extension of Isomap may introduce undesired translations when embedding new data. You can fix this problem by executing mapping.DD = sqrt(mapping.DD); before running the OUT_OF_SAMPLE function.

  4. The toolbox does only work properly on Matlab 2007a or newer versions of Matlab. If your Matlab-version does not support BSXFUN, try using this code.

  5. The toolbox may not work properly when the data contains duplicates.

  6. Compiling dijkstra.cpp sometimes gives errors on Windows-machines. Note that you only need to compile this file if the provided DLL does not work on your Windows-system. The compilation can successfully be performed using the MinGW compiler. More information on setting the compiler used by MEX can be found in your Matlab documentation. On Windows 64-bit machines, replacing the dijkstra.cpp file by this adapted version (by Nicolas Rey) may help.


Version history

  1. Version 0.7b:
       - Many small bugfixes and speed improvements.
       - Added out-of-sample extension for manifold charting.
       - Added first version of graphical user interface for the toolbox. The GUI was developed by Maxim Vedenev with the help of Susanth Vemulapalli and Maarten Huybrecht. I made some changes in the initial version of the GUI code.
       - Added implementation of Gaussian Process Latent Variable Model (GPLVM).
       - Removed Simple PCA as probabilistic PCA is more appropriate.

  2. Version 0.6b:
       - Resolved bug in LLE that was introduced with v0.5b.
       - Added implementation of t-SNE.
       - Resolved small bug in data generation function.
       - Improved RBM implementation in autoencoders (note that successful training of an RBM still depends on parameter settings such as weight_cost and learning rate that can only be set in the train_rbm.m code).
       - Added implementation of Sammon mapping.
       - Removed dependency on the Statistics toolbox in Laplacian Eigenmaps.
       - Resolved bug in implementation of SPE.
       - Various speed and memory improvements by exploiting Matlab's new BSXFUN functionality.

  3. Version 0.5b:
       - Resolved issues with unconnected neighborhood graph for LLE and Laplacian Eigenmaps (now works like Isomap).
       - Resolved bug in prewhitening of data.
       - Improved implementations of SNE and symmetric SNE.
       - Resolved two bugs in nearest neighbor intrinsic dimensionality estimator.
       - Replaced MDS implementation by implementation for classical MDS.

  4. Version 0.4b:
       - Added Symmetric SNE ('SymSNE') implementation.
       - Added Landmark MVU ('LandmarkMVU') implementation.
       - Added completely new implementation of autoencoders using RBM training.
       - Added out-of-sample extensions for (Landmark) Isomap, LLE, Laplacian Eigenmaps, Landmark MVU, and FastMVU.
       - Added new 'difficult' dataset to data generation function.
       - Improved implementations of NPE, LPP, and LLTSA.
       - Resolved issue with parameter parsing in manifold charting.
       - Resolved issue with adaptive neighborhood selection combined target dimensionalities higher than 40.
       - The number of timesteps t can now be specified in diffusion maps.
       - Speed up the implementations of Kernel PCA and Kernel LDA for datasets with over 3,000 instances (with factor 5).
       - Resolved efficiency issue eigendecomposition performed by diffusion maps.
       - Speed improvement in neighbor search for datasets with over 2,000 datapoints (with assistance from James Monaco).
       - Speed improvement of Hessian LLE implementation.
       - The toolbox now works without using the Statistics Toolbox.
       - Data generation function now also returns the true underlying manifold.
       - Resolved issue that might occur when Isomap or FastMVU are employed on a PRTools dataset.

  5. Version 0.3b:
       - Improved PCA implementation for cases in which D > N.

   - Added implementation of probabilistic PCA (using EM algorithm).

   - Added implementation of manifold charting.

   - Added function for adaptive neighborhood selection (with assistance from Nathan Mekuz).

   - Various speed improvements (with assistance from Nathan Mekuz).

   - Added welcome message.

   - Added contents information for VER command.

   - Fixed issue with divisions by zero in intrinsic dimensionality estimators.

   - Removed implementation of ICA from the toolbox.

  1. Version 0.2b:

   - Resolved issues in LPP, NPE, LTSA, and Kernel PCA implementations.

   - Added implementation of LLTSA.

   - Added Conformal Eigenmaps (CCA) as a postprocessing step for LLE.

   - Added MVU as a postprocessing step for LLE.

   - Added function for prewhitening of data.

   - Added function for exact out-of-sample extensions for PCA, LDA, NPE, LPP, LLTSA, autoencoders, and Kernel PCA.

   - Added six techniques for intrinsic dimension estimation.

  1. Version 0.1b:

   - The initial realease of the toolbox.


Datasets

In order to allow you to quickly run some experiments yourself, the datasets I used in the paper are available for download here. You can download all datasets in a single ZIP-archive (21.1 MB), or download a separate dataset:

  1. Swiss roll dataset (0.8 MB)

  2. Twin peaks dataset (0.5 MB)

  3. Helix dataset (0.4 MB)

  4. 3D clusters dataset (0.1 MB)

  5. Intersecting dataset (0.4 MB)

  6. MNIST dataset (2.9 MB)

  7. COIL20 dataset (11.3 MB)

  8. Faces dataset (23.1 MB)

  9. ADA dataset (0.9 MB)

  10. GINA dataset (3.1 MB)

  11. HIVA dataset (1.2 MB)

  12. NOVA dataset (0.3 MB)

  13. SYLVA dataset (2.3 MB)


Bugs/questions/suggestions

You think you have found a bug? You have a question about anything you found on this website? You have suggestions to improve the software that is posted here? You have request for a new feature in the software? Please send me an email!