.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "tutorials/classification_2D_proba.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_tutorials_classification_2D_proba.py: Probabilistic classification =================================== This tutorial demonstrates how to use TabICL for classification and how to interpret its probabilistic outputs. .. GENERATED FROM PYTHON SOURCE LINES 9-18 .. code-block:: Python import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.datasets import make_moons from sklearn.metrics import roc_auc_score from sklearn.calibration import CalibrationDisplay from tabicl import TabICLClassifier .. GENERATED FROM PYTHON SOURCE LINES 19-25 Generate 2D classification data -------------------------------- We generate a simple two‑moon 2D dataset with fairly large noise. A 2D dataset is useful for visualisation purposes and the noise makes the classification porblem non-separable, which is a common situation in real-world applications. .. GENERATED FROM PYTHON SOURCE LINES 25-32 .. code-block:: Python X, y = make_moons(n_samples=1000, noise=0.35, random_state=0) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=0 ) .. GENERATED FROM PYTHON SOURCE LINES 33-39 Fit TabICL ----------- The ``fit`` method just downloads TabICL weights if they have not been downloaded already, while the ``predict_proba`` does the forward pass of the model and returns the predicted probabilities for each class. .. GENERATED FROM PYTHON SOURCE LINES 39-46 .. code-block:: Python tabicl = TabICLClassifier() tabicl.fit(X_train, y_train) # Predict probabilities on test set y_proba = tabicl.predict_proba(X_test) .. GENERATED FROM PYTHON SOURCE LINES 47-53 Plot predicted probabilities ---------------------------- Since the problem is 2D, we can qualitatively assess the quality of the model's probabilistic predictions by plotting the decision boundary induced by the predicted probabilities. .. GENERATED FROM PYTHON SOURCE LINES 53-87 .. code-block:: Python fig, ax = plt.subplots(figsize=(5, 4), constrained_layout=True) # Create a mesh to plot decision boundaries h = 0.2 offset = 0.5 x_min, x_max = X[:, 0].min() - offset, X[:, 0].max() + offset y_min, y_max = X[:, 1].min() - offset, X[:, 1].max() + offset xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) # Predict probabilities on mesh Z = tabicl.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1] Z = Z.reshape(xx.shape) # Plot decision boundary and margins ax.contourf(xx, yy, Z, levels=20, cmap="RdYlBu_r", alpha=0.8) ax.contour(xx, yy, Z, levels=[0.5], colors="black", linewidths=2) # Plot training data scatter = ax.scatter( X_test[:, 0], X_test[:, 1], c=y_test, cmap="RdYlBu_r", edgecolors="k", s=50, alpha=0.8, ) ax.set(xlabel="Feature 1", ylabel="Feature 2") ax.set_title("TabICL predicted class probabilities (2D)") plt.colorbar(scatter, ax=ax, label="Probability of class 1") plt.show() .. image-sg:: /tutorials/images/sphx_glr_classification_2D_proba_001.png :alt: TabICL predicted class probabilities (2D) :srcset: /tutorials/images/sphx_glr_classification_2D_proba_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 88-100 Test data points are coloured by their true label. The black contour line shows the decision boundary at a probability threshold of 0.5. The colour shading indicates the estimated probability for class 1. It is interesting to observe that the model is less confident (probability closer to 0.5) in the noisy regions of the dataset close to the decision boundary. We also observe even less confident predictions when we follow the decision boundary further away from the training data of this particular task. This is a desirable property: it is able to express more uncertainty in regions of the feature space that are far from the training data. .. GENERATED FROM PYTHON SOURCE LINES 103-111 Evaluate model performance -------------------------- For probabilistic binary classifiers, ROC AUC summarizes the ranking quality of predicted probabilities independently of a fixed classification threshold. A ROC AUC of 1.0 is perfect and 0.5 corresponds to random guessing. Since the classification task is noisy, we expect a value between those two extremes. .. GENERATED FROM PYTHON SOURCE LINES 112-116 .. code-block:: Python roc_auc = roc_auc_score(y_test, y_proba[:, 1]) print(f"Test ROC AUC: {roc_auc:.3f}") .. rst-class:: sphx-glr-script-out .. code-block:: none Test ROC AUC: 0.957 .. GENERATED FROM PYTHON SOURCE LINES 117-119 In complement, we can also look at the calibration of the model's probabilistic predictions by plotting the calibration curve: .. GENERATED FROM PYTHON SOURCE LINES 120-128 .. code-block:: Python fig, ax = plt.subplots(figsize=(3.8, 3.2), constrained_layout=True) _ = CalibrationDisplay.from_predictions( y_test, y_proba[:, 1], strategy="quantile", n_bins=7, ax=ax, ) ax.set_title("Calibration curve") plt.show() .. image-sg:: /tutorials/images/sphx_glr_classification_2D_proba_002.png :alt: Calibration curve :srcset: /tutorials/images/sphx_glr_classification_2D_proba_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 129-132 We expect TabICL to produce reasonably well calibrated probabilistic predictions by default. This is what we observe here: the calibration curve is close to the diagonal line. .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 22.098 seconds) .. _sphx_glr_download_tutorials_classification_2D_proba.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: classification_2D_proba.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: classification_2D_proba.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: classification_2D_proba.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_