.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/unsupervised_learning.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_unsupervised_learning.py: Unsupervised learning ===================== This tutorial demonstrates how to use TabICL for unsupervised tasks. .. GENERATED FROM PYTHON SOURCE LINES 9-14 .. code-block:: Python import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_moons from tabicl import TabICLUnsupervised .. GENERATED FROM PYTHON SOURCE LINES 15-25 ``TabICLUnsupervised`` supports density estimation and outlier detection through ``score_samples``, missing-value imputation through ``impute``, and synthetic data generation through ``generate``. .. note:: Compared with :class:`tabicl.TabICLClassifier` and :class:`tabicl.TabICLRegressor`, :class:`tabicl.TabICLUnsupervised` is an experimental implementation, which has not been evaluated on large benchmarks. Use with caution. .. GENERATED FROM PYTHON SOURCE LINES 28-33 Fit the model -------------- We use the classic two-moon dataset with only 200 samples so inference stays fast. .. GENERATED FROM PYTHON SOURCE LINES 33-36 .. code-block:: Python X, y = make_moons(n_samples=200, noise=0.15, random_state=42) .. GENERATED FROM PYTHON SOURCE LINES 37-39 Similarly to ``TabICLClassifier`` or ``TabICLRegressor``, calling ``fit()`` only stores the training data and loads the shared model weights once. .. GENERATED FROM PYTHON SOURCE LINES 39-54 .. code-block:: Python model = TabICLUnsupervised( n_estimators=4, categorical_features=[], device="cpu", random_state=42, ) model.fit(X) # Shared axis limits so all 2D scatter plots are directly comparable. pad = 1.0 xlim = (X[:, 0].min() - pad, X[:, 0].max() + pad) ylim = (X[:, 1].min() - pad, X[:, 1].max() + pad) .. GENERATED FROM PYTHON SOURCE LINES 55-76 Outlier detection with ``score_samples()`` ------------------------------------------- Density estimate, outlier detection and data generation rely on an estimation of the joint probability density :math:`P(X_1, \ldots, X_d)`. TabICL approximates this using the chain rule: .. math:: P(X_1, \ldots, X_d) = \prod_k P(X_k \mid X_{` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: unsupervised_learning.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: unsupervised_learning.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_