LESS is More: A Comprehensive Tutorial ======================================= **LESS (Learning with Subset Stacking)** is a scalable and versatile ensemble learning framework. It constructs an ensemble of local models trained on subsets of data and combines their predictions using a global meta-estimator. In this tutorial, we will explore the two main variants of LESS: 1. **LESS-A (Averaging):** Trains multiple iterations of local/global models and averages their predictions. 2. **LESS-B (Boosting):** Trains models sequentially, where each stage learns to correct the residuals of the previous stage. We will also dive deep into the critical parameters that control the behavior of LESS, such as ``n_subsets``, ``n_estimators``, ``min_neighbors``, and the choice of estimators. .. code-block:: python import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.datasets import fetch_openml from sklearn.metrics import mean_squared_error from sklearn.tree import DecisionTreeRegressor from less import LESSARegressor, LESSBRegressor Data Preparation ---------------- We will use the **Abalone** dataset for this tutorial. It contains physical measurements of abalones, and the goal is to predict the age (number of rings). .. code-block:: python abalone = fetch_openml(name="abalone", version=1, as_frame=True) X = pd.get_dummies(abalone.data, drop_first=True, dtype=np.float32) y = abalone.target.astype(np.float32) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) print(f"Training set size: {X_train.shape}") print(f"Test set size: {X_test.shape}") Output: .. code-block:: text Training set size: (3341, 9) Test set size: (836, 9) 1. LESS-A (Averaging) --------------------- **LESS-A** (``LESSARegressor``) is the averaging variant. It performs multiple independent iterations. In each iteration, it selects subsets of data, trains local models, and optionally trains a global model to combine them. The final prediction is the average of predictions from all iterations. Key Parameter: ``n_estimators`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For LESS-A, ``n_estimators`` controls the number of averaging iterations. * **Default:** 100 * **Effect:** Generally, more estimators lead to more stable predictions (lower variance) but increase training time linearly. .. code-block:: python # LESS-A with default parameters (n_estimators=100) less_a = LESSARegressor(n_estimators=100, random_state=42) less_a.fit(X_train, y_train) y_pred_a = less_a.predict(X_test) print(f'LESS-A Test MSE: {mean_squared_error(y_test, y_pred_a):.4f}') Output: .. code-block:: text LESS-A Test MSE: 4.3957 2. LESS-B (Boosting) -------------------- **LESS-B** (``LESSBRegressor``) applies a boosting strategy. Instead of independent iterations, it trains models sequentially. Each stage learns to correct the residuals (errors) of the previous stage. Key Parameters: ``n_estimators`` and ``learning_rate`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * ``n_estimators``: The number of boosting stages (default: 100). * ``learning_rate``: Shrinks the contribution of each estimator (default: 0.1). There is a trade-off between ``learning_rate`` and ``n_estimators``. .. code-block:: python less_b = LESSBRegressor( n_estimators=50, # 50 boosting stages learning_rate=0.1, # Shrinkage parameter random_state=42 ) less_b.fit(X_train, y_train) y_pred_b = less_b.predict(X_test) print(f'LESS-B Test MSE: {mean_squared_error(y_test, y_pred_b):.4f}') Output: .. code-block:: text LESS-B Test MSE: 4.3576 3. Critical Parameters Deep Dive -------------------------------- To get the most out of LESS, it's essential to understand its key parameters. Number of Subsets (``n_subsets``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is perhaps the most critical parameter. It determines how many local subsets are created in each iteration (or stage). * **Default:** 20 * **Effect:** A higher ``n_subsets`` means more local models, which can capture more local details but increases computational cost. It also affects the number of neighbors used for training each local model (see ``min_neighbors`` below). .. code-block:: python # Experimenting with n_subsets for n in [5, 20, 50]: model = LESSARegressor(n_subsets=n, random_state=42) model.fit(X_train, y_train) mse = mean_squared_error(y_test, model.predict(X_test)) print(f'n_subsets={n}: MSE={mse:.4f}') Output: .. code-block:: text n_subsets=5: MSE=4.4578 n_subsets=20: MSE=4.3957 n_subsets=50: MSE=4.3848 Minimum Neighbors (``min_neighbors``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This parameter ensures that each local model has enough data to train on. * **Default:** 10 * **Internal Logic:** LESS automatically calculates the number of neighbors (``n_neighbors``) for each subset based on the dataset size (``n_samples``) and ``n_subsets``. .. math:: \text{suggested_neighbors} = \max(\text{min_neighbors}, \lfloor \frac{\text{n_samples}}{\text{n_subsets}} \rfloor) \text{n_neighbors} = \min(\text{suggested_neighbors}, \text{n_samples}) This logic ensures that even if you have many subsets, each local model will see at least ``min_neighbors`` samples (overlapping if necessary). If ``n_subsets`` is small, the local models will see more data (``n_samples / n_subsets``). .. code-block:: python # Experimenting with min_neighbors # This allows min_neighbors to actually control the subset size. for n in [50, 500, 1000]: model = LESSARegressor(n_subsets=10, min_neighbors=n, random_state=42) model.fit(X_train, y_train) mse = mean_squared_error(y_test, model.predict(X_test)) print(f'min_neighbors={n}: MSE={mse:.4f}') Output: .. code-block:: text min_neighbors=50: MSE=4.4130 min_neighbors=500: MSE=4.4186 min_neighbors=1000: MSE=4.4224 Local Estimator (``local_estimator``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This defines the model used for each local subset. * **Default:** ``'linear'`` (uses ``LinearRegression``) * **Options:** * ``'linear'``: Standard Linear Regression. * ``'tree'``: A ``DecisionTreeRegressor`` with specific parameters (max_leaf_nodes=31, etc.). * **Custom:** You can pass any callable that returns a scikit-learn compatible regressor (e.g., ``lambda: SVR()``). .. code-block:: python # Using 'tree' as local estimator less_tree_local = LESSARegressor(local_estimator='tree', random_state=42) less_tree_local.fit(X_train, y_train) print(f'Local=Tree MSE: {mean_squared_error(y_test, less_tree_local.predict(X_test)):.4f}') # Using a custom local estimator less_custom_local = LESSARegressor(local_estimator=lambda: DecisionTreeRegressor(max_depth=5), random_state=42) less_custom_local.fit(X_train, y_train) print(f'Local=CustomTree MSE: {mean_squared_error(y_test, less_custom_local.predict(X_test)):.4f}') Output: .. code-block:: text Local=Tree MSE: 5.0080 Local=CustomTree MSE: 4.9919 Global Estimator (``global_estimator``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The global estimator combines the predictions of the local models. * **Default:** ``'xgboost'`` (uses ``XGBRFRegressor``) * **Options:** * ``'xgboost'``: Random Forest regressor from XGBoost. * ``None``: Removes the global estimator. The final prediction becomes a weighted average of local predictions. * **Custom:** Any callable returning a regressor (e.g., ``lambda: RandomForestRegressor()``). .. code-block:: python # Using default (XGBoost) less_default = LESSARegressor(random_state=42) less_default.fit(X_train, y_train) print(f'Global=XGBoost MSE: {mean_squared_error(y_test, less_default.predict(X_test)):.4f}') # Removing global estimator (Weighted Average) less_no_global = LESSARegressor(global_estimator=None, random_state=42) less_no_global.fit(X_train, y_train) print(f'No Global MSE: {mean_squared_error(y_test, less_no_global.predict(X_test)):.4f}') Output: .. code-block:: text Global=XGBoost MSE: 4.3957 No Global MSE: 6.1603 Validation Split (``val_size``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can split the dataset into training and validation sets within LESS. * **Purpose:** The training set is used to train the **local estimators**, while the validation set is used to train the **global estimator**. This can help prevent overfitting, especially when the global estimator is powerful. * **Usage:** Set ``val_size`` to a float between 0 and 1 (e.g., ``0.2`` for 20% validation data). .. code-block:: python less_val = LESSARegressor(val_size=0.2, random_state=42) less_val.fit(X_train, y_train) y_pred_val = less_val.predict(X_test) print(f'Test error (val_size=0.2): {mean_squared_error(y_test, y_pred_val):.4f}') Output: .. code-block:: text Test error (val_size=0.2): 4.3987 Clustering Method (``cluster_method``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This parameter controls how the centers of the subsets are selected. * **Default:** ``'tree'`` (Random sampling). It selects ``n_subsets`` centers randomly from the data. * **Options:** * ``'tree'``: Random sampling. * ``'kmeans'``: Uses K-Means clustering. **Crucially, the number of clusters is set equal to** ``n_subsets``. The cluster centers found by K-Means become the centers of the subsets. .. code-block:: python # Using K-Means for clustering # Here, n_subsets=20 means K-Means will find 20 cluster centers less_kmeans = LESSARegressor(cluster_method='kmeans', n_subsets=20, random_state=42) less_kmeans.fit(X_train, y_train) print(f'Cluster=KMeans MSE: {mean_squared_error(y_test, less_kmeans.predict(X_test)):.4f}') Output: .. code-block:: text Cluster=KMeans MSE: 4.3866 Random State (``random_state``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Controls the randomness of the algorithm (subset selection, local estimator initialization, global estimator initialization). Setting this ensures reproducibility of your results.