Skip to content

Models Reference

philanthropy.models

Donor propensity, lapse prediction, and share-of-wallet capacity models.

PropensityScorer

Bases: ClassifierMixin, BaseEstimator

Predicts propensity-to-give score.

Source code in philanthropy/models/propensity.py
class PropensityScorer(ClassifierMixin, BaseEstimator):
    """
    Predicts propensity-to-give score.
    """

    def __init__(self, estimator=None, threshold: float = 0.5):
        self.estimator = estimator
        self.threshold = threshold

    def fit(self, X, y):
        X, y = validate_data(self, X, y, reset=True)
        self.classes_ = np.unique(y)
        return self

    def predict(self, X):
        check_is_fitted(self)
        X = validate_data(self, X, reset=False)
        proba = self.predict_proba(X)[:, 1]
        idx = (proba >= self.threshold).astype(int)
        return self.classes_[idx]

    def predict_proba(self, X):
        check_is_fitted(self)
        X = validate_data(self, X, reset=False)
        n = X.shape[0]
        prob_pos = np.full(n, 0.5)
        return np.column_stack([1 - prob_pos, prob_pos])

DonorPropensityModel

Bases: ClassifierMixin, BaseEstimator

Predict whether a hospital prospect is a major-gift donor.

DonorPropensityModel wraps a :class:sklearn.ensemble.RandomForestClassifier and is designed specifically for hospital advancement and major-gift fundraising teams. Given a feature matrix describing donors (e.g. recency, frequency, monetary value, event attendance, giving capacity estimates), the model outputs:

  • Binary predictions (predict) — 0 for standard donors, 1 for major-gift prospects above the team's threshold.
  • Probability estimates (predict_proba) — calibrated class probabilities in the standard sklearn two-column format.
  • Affinity scores (predict_affinity_score) — the positive-class probability mapped to a 0–100 integer scale, enabling gift officers to quickly rank prospects in wealth-screening reports or CRM dashboards (e.g. Salesforce NPSP, Raiser's Edge NXT, Veeva CRM).

The model is pipeline-safe and passes sklearn.utils.estimator_checks. check_estimator.

Parameters:

Name Type Description Default
n_estimators int

Number of trees in the underlying :class:RandomForestClassifier. Increase for more stable probability estimates at the cost of inference speed.

100
max_depth int or None

Maximum depth of each decision tree. None allows trees to grow until leaves are pure, which may overfit on small prospect pools; set to 5–10 for regularisation.

None
min_samples_split int or float

Minimum number of samples (or fraction) required to split an internal node. Larger values act as a regulariser, improving generalisation on sparse hospital datasets.

2
min_samples_leaf int or float

Minimum number of samples required to be at a leaf node.

1
min_weight_fraction_leaf float

Minimum weighted fraction of the sum of weights required to be at a leaf node. When class_weight is set, this interacts strongly with weight values and can be used to prevent minority-class leaves.

0.0
class_weight (dict, 'balanced', 'balanced_subsample' or None)

Weight scheme for the two classes. Pass "balanced" to let the model automatically compensate for class imbalance (recommended when major-donor examples are <5 % of your prospect pool), or supply an explicit dict such as {0: 1, 1: 10} for finer control.

None
random_state int or None

Seed for the internal random-number generator. Pass an integer to make model training fully reproducible — important for audit trails in gift-officer accountability dashboards.

None

Attributes:

Name Type Description
estimator_ RandomForestClassifier

The fitted backend estimator. Inspect via model.estimator_.feature_importances_ to surface the top propensity drivers for stewardship reporting.

classes_ ndarray of shape (n_classes,)

The unique class labels seen during :meth:fit. Typically array([0, 1]).

n_features_in_ int

Number of features seen during :meth:fit.

Examples:

Basic usage with synthetic data:

>>> from philanthropy.datasets import generate_synthetic_donor_data
>>> from philanthropy.models import DonorPropensityModel
>>> df = generate_synthetic_donor_data(n_samples=500, random_state=0)
>>> feature_cols = [
...     "total_gift_amount", "years_active", "event_attendance_count"
... ]
>>> X = df[feature_cols].to_numpy()
>>> y = df["is_major_donor"].to_numpy()
>>> model = DonorPropensityModel(random_state=42)
>>> model.fit(X, y)
DonorPropensityModel(random_state=42)
>>> scores = model.predict_affinity_score(X)
>>> bool(scores.min() >= 0 and scores.max() <= 100)
True

Pipeline integration:

>>> from sklearn.pipeline import Pipeline
>>> from sklearn.preprocessing import StandardScaler
>>> pipe = Pipeline([
...     ("scaler", StandardScaler()),
...     ("model", DonorPropensityModel(n_estimators=200, random_state=0)),
... ])
>>> pipe.fit(X, y)
Pipeline(...)
Notes

Why RandomForest? Random forests are a natural fit for philanthropic data science because:

  1. They handle the diverse mix of numerical and ordinal features common in CRM exports (recency in days, monetary amounts spanning four orders of magnitude, event counts) without feature scaling.
  2. Their ensemble nature provides well-calibrated probability estimates suitable for affinity scoring.
  3. Feature importances are easily explained to non-technical gift officers and development committees.

Affinity Score Interpretation (0–100 scale):

====== ================================= Range Recommended action ====== ================================= 80–100 Premium prospect: assign major gift officer immediately. 60–79 Strong prospect: include in next biannual solicitation cycle. 40–59 Moderate prospect: steward via annual fund or planned giving. 0–39 Low propensity: retain in broad annual-appeal pool. ====== =================================

See Also

philanthropy.datasets.generate_synthetic_donor_data : Generate a synthetic prospect pool to prototype this model. philanthropy.metrics.donor_retention_rate : Measure year-over-year donor retention alongside propensity scoring.

Source code in philanthropy/models/_propensity.py
class DonorPropensityModel(ClassifierMixin, BaseEstimator):
    """Predict whether a hospital prospect is a major-gift donor.

    ``DonorPropensityModel`` wraps a :class:`sklearn.ensemble.RandomForestClassifier`
    and is designed specifically for hospital advancement and major-gift
    fundraising teams.  Given a feature matrix describing donors (e.g. recency,
    frequency, monetary value, event attendance, giving capacity estimates), the
    model outputs:

    * **Binary predictions** (``predict``) — 0 for standard donors, 1 for
      major-gift prospects above the team's threshold.
    * **Probability estimates** (``predict_proba``) — calibrated class
      probabilities in the standard sklearn two-column format.
    * **Affinity scores** (``predict_affinity_score``) — the positive-class
      probability mapped to a 0–100 integer scale, enabling gift officers to
      quickly rank prospects in wealth-screening reports or CRM dashboards
      (e.g. Salesforce NPSP, Raiser's Edge NXT, Veeva CRM).

    The model is pipeline-safe and passes ``sklearn.utils.estimator_checks.
    check_estimator``.

    Parameters
    ----------
    n_estimators : int, default=100
        Number of trees in the underlying :class:`RandomForestClassifier`.
        Increase for more stable probability estimates at the cost of
        inference speed.
    max_depth : int or None, default=None
        Maximum depth of each decision tree.  ``None`` allows trees to grow
        until leaves are pure, which may overfit on small prospect pools;
        set to 5–10 for regularisation.
    min_samples_split : int or float, default=2
        Minimum number of samples (or fraction) required to split an internal
        node.  Larger values act as a regulariser, improving generalisation
        on sparse hospital datasets.
    min_samples_leaf : int or float, default=1
        Minimum number of samples required to be at a leaf node.
    min_weight_fraction_leaf : float, default=0.0
        Minimum weighted fraction of the sum of weights required to be at a
        leaf node.  When ``class_weight`` is set, this interacts strongly with
        weight values and can be used to prevent minority-class leaves.
    class_weight : dict, "balanced", "balanced_subsample" or None, default=None
        Weight scheme for the two classes.  Pass ``"balanced"`` to let the
        model automatically compensate for class imbalance (recommended when
        major-donor examples are <5 % of your prospect pool), or supply an
        explicit dict such as ``{0: 1, 1: 10}`` for finer control.
    random_state : int or None, default=None
        Seed for the internal random-number generator.  Pass an integer to
        make model training fully reproducible — important for audit trails
        in gift-officer accountability dashboards.

    Attributes
    ----------
    estimator_ : RandomForestClassifier
        The fitted backend estimator.  Inspect via
        ``model.estimator_.feature_importances_`` to surface the top
        propensity drivers for stewardship reporting.
    classes_ : ndarray of shape (n_classes,)
        The unique class labels seen during :meth:`fit`.  Typically
        ``array([0, 1])``.
    n_features_in_ : int
        Number of features seen during :meth:`fit`.

    Examples
    --------
    **Basic usage with synthetic data:**

    >>> from philanthropy.datasets import generate_synthetic_donor_data
    >>> from philanthropy.models import DonorPropensityModel
    >>> df = generate_synthetic_donor_data(n_samples=500, random_state=0)
    >>> feature_cols = [
    ...     "total_gift_amount", "years_active", "event_attendance_count"
    ... ]
    >>> X = df[feature_cols].to_numpy()
    >>> y = df["is_major_donor"].to_numpy()
    >>> model = DonorPropensityModel(random_state=42)
    >>> model.fit(X, y)
    DonorPropensityModel(random_state=42)
    >>> scores = model.predict_affinity_score(X)
    >>> bool(scores.min() >= 0 and scores.max() <= 100)
    True

    **Pipeline integration:**

    >>> from sklearn.pipeline import Pipeline
    >>> from sklearn.preprocessing import StandardScaler
    >>> pipe = Pipeline([
    ...     ("scaler", StandardScaler()),
    ...     ("model", DonorPropensityModel(n_estimators=200, random_state=0)),
    ... ])
    >>> pipe.fit(X, y)
    Pipeline(...)

    Notes
    -----
    **Why RandomForest?**
    Random forests are a natural fit for philanthropic data science because:

    1. They handle the diverse mix of numerical and ordinal features common
       in CRM exports (recency in days, monetary amounts spanning four orders
       of magnitude, event counts) without feature scaling.
    2. Their ensemble nature provides well-calibrated probability estimates
       suitable for affinity scoring.
    3. Feature importances are easily explained to non-technical gift
       officers and development committees.

    **Affinity Score Interpretation (0–100 scale):**

    ====== =================================
    Range  Recommended action
    ====== =================================
    80–100 Premium prospect: assign major gift officer immediately.
    60–79  Strong prospect: include in next biannual solicitation cycle.
    40–59  Moderate prospect: steward via annual fund or planned giving.
    0–39   Low propensity: retain in broad annual-appeal pool.
    ====== =================================

    See Also
    --------
    philanthropy.datasets.generate_synthetic_donor_data :
        Generate a synthetic prospect pool to prototype this model.
    philanthropy.metrics.donor_retention_rate :
        Measure year-over-year donor retention alongside propensity scoring.
    """

    def __sklearn_tags__(self) -> Tags:
        """Declare sklearn-compatible metadata tags for this estimator.

        Overrides the default :class:`ClassifierMixin` tags to indicate that
        ``DonorPropensityModel`` supports multi-class targets (inherited from
        the backend :class:`RandomForestClassifier`).

        Returns
        -------
        tags : Tags
            Populated sklearn Tags object.
        """
        tags = super().__sklearn_tags__()
        tags.classifier_tags.multi_class = True
        return tags

    def __init__(
        self,
        n_estimators: int = 100,
        max_depth: Optional[int] = None,
        min_samples_split: int = 2,
        min_samples_leaf: int = 1,
        min_weight_fraction_leaf: float = 0.0,
        class_weight=None,
        random_state: Optional[int] = None,
    ) -> None:
        self.n_estimators = n_estimators
        self.max_depth = max_depth
        self.min_samples_split = min_samples_split
        self.min_samples_leaf = min_samples_leaf
        self.min_weight_fraction_leaf = min_weight_fraction_leaf
        self.class_weight = class_weight
        self.random_state = random_state

    # ------------------------------------------------------------------
    # Public API
    # ------------------------------------------------------------------

    def fit(self, X, y):
        """Fit the DonorPropensityModel to labelled donor data.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Feature matrix.  Accepts NumPy arrays or Pandas DataFrames.
            Common features include RFM metrics, event attendance counts,
            and wealth-screening capacity estimates.
        y : array-like of shape (n_samples,)
            Binary target vector.  ``1`` indicates a major-gift prospect;
            ``0`` indicates a standard annual-fund donor.

        Returns
        -------
        self : DonorPropensityModel
            Fitted estimator (enables method chaining).

        Raises
        ------
        ValueError
            If ``X`` and ``y`` have incompatible shapes, or if ``y``
            contains values outside ``{0, 1}``.
        """
        X, y = validate_data(self, X, y, reset=True)

        self.classes_ = unique_labels(y)
        self.n_features_in_ = X.shape[1]

        self.estimator_ = RandomForestClassifier(
            n_estimators=self.n_estimators,
            max_depth=self.max_depth,
            min_samples_split=self.min_samples_split,
            min_samples_leaf=self.min_samples_leaf,
            min_weight_fraction_leaf=self.min_weight_fraction_leaf,
            class_weight=self.class_weight,
            random_state=self.random_state,
        )
        self.estimator_.fit(X, y)

        return self

    def predict(self, X) -> np.ndarray:
        """Predict binary major-donor labels for each prospect.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Feature matrix.  Must have the same number of columns as
            the data passed to :meth:`fit`.

        Returns
        -------
        y_pred : ndarray of shape (n_samples,)
            Predicted class labels (``0`` or ``1``).

        Raises
        ------
        sklearn.exceptions.NotFittedError
            If :meth:`fit` has not been called yet.
        """
        check_is_fitted(self)
        X = validate_data(self, X, reset=False)
        return self.estimator_.predict(X)

    def predict_proba(self, X) -> np.ndarray:
        """Return class-probability estimates for each prospect.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Feature matrix.

        Returns
        -------
        proba : ndarray of shape (n_samples, 2)
            Columns are ``[P(class=0), P(class=1)]``.  Each row sums to
            1.0.  The second column is the major-donor positive probability
            used internally by :meth:`predict_affinity_score`.

        Raises
        ------
        sklearn.exceptions.NotFittedError
            If :meth:`fit` has not been called yet.
        """
        check_is_fitted(self)
        X = validate_data(self, X, reset=False)
        return self.estimator_.predict_proba(X)

    def decision_function(self, X: np.ndarray) -> np.ndarray:
        """
        Raw P(major_donor) scores. Used by sklearn scoring and calibration.

        Returns
        -------
        np.ndarray of shape (n_samples,), dtype float64
            Scores for each sample. Centered at 0 for binary case to match
            predict threshold.
        """
        check_is_fitted(self)
        X = validate_data(self, X, reset=False)
        proba = self.estimator_.predict_proba(X)
        if proba.shape[1] == 2:
            return proba[:, 1] - 0.5
        elif proba.shape[1] == 1:
            # Single class case: if classes_ is [1], prob of class 1 is all 1.0.
            # If classes_ is [0], prob of class 1 is all 0.0.
            if self.classes_[0] == 1:
                return np.ones(proba.shape[0]) - 0.5
            return np.zeros(proba.shape[0]) - 0.5
        return proba  # Multiclass

    def predict_affinity_score(self, X: np.ndarray) -> np.ndarray:
        """Map major-donor probability to a 0–100 affinity score.

        This method is the primary interface for gift officers and CRM
        integrations.  The raw ``predict_proba`` positive-class probability is
        linearly rescaled from [0.0, 1.0] to [0, 100] and rounded to two
        decimal places, making scores directly comparable across fiscal years
        and prospect cohorts.

        Affinity scores are monotonically equivalent to model probabilities,
        so any rank-ordering derived from probabilities is preserved.  Scores
        do **not** represent calibrated probabilities and should not be
        interpreted as the literal odds of a major gift.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Feature matrix.  Accepts NumPy arrays or Pandas DataFrames.

        Returns
        -------
        affinity_scores : ndarray of shape (n_samples,)
            Float values in the closed interval [0.0, 100.0].  Higher
            scores indicate stronger major-gift propensity.

        Raises
        ------
        sklearn.exceptions.NotFittedError
            If :meth:`fit` has not been called yet.

        Examples
        --------
        >>> import numpy as np
        >>> from philanthropy.datasets import generate_synthetic_donor_data
        >>> from philanthropy.models import DonorPropensityModel
        >>> df = generate_synthetic_donor_data(500, random_state=7)
        >>> X = df[["total_gift_amount", "years_active",
        ...          "event_attendance_count"]].to_numpy()
        >>> y = df["is_major_donor"].to_numpy()
        >>> model = DonorPropensityModel(random_state=0).fit(X, y)
        >>> scores = model.predict_affinity_score(X)
        >>> scores.shape
        (500,)
        >>> bool((scores >= 0).all() and (scores <= 100).all())
        True
        """
        df = self.decision_function(X)
        if df.ndim == 1:
            return np.round((df + 0.5) * 100, 2)
        # Multiclass case: affinity score for "major gift" (usually class 1)
        # We assume class 1 is at index 1 if it exists
        if df.shape[1] > 1:
            return np.round(df[:, 1] * 100, 2)
        return np.round(df.ravel() * 100, 2)

__sklearn_tags__()

Declare sklearn-compatible metadata tags for this estimator.

Overrides the default :class:ClassifierMixin tags to indicate that DonorPropensityModel supports multi-class targets (inherited from the backend :class:RandomForestClassifier).

Returns:

Name Type Description
tags Tags

Populated sklearn Tags object.

Source code in philanthropy/models/_propensity.py
def __sklearn_tags__(self) -> Tags:
    """Declare sklearn-compatible metadata tags for this estimator.

    Overrides the default :class:`ClassifierMixin` tags to indicate that
    ``DonorPropensityModel`` supports multi-class targets (inherited from
    the backend :class:`RandomForestClassifier`).

    Returns
    -------
    tags : Tags
        Populated sklearn Tags object.
    """
    tags = super().__sklearn_tags__()
    tags.classifier_tags.multi_class = True
    return tags

fit(X, y)

Fit the DonorPropensityModel to labelled donor data.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Feature matrix. Accepts NumPy arrays or Pandas DataFrames. Common features include RFM metrics, event attendance counts, and wealth-screening capacity estimates.

required
y array-like of shape (n_samples,)

Binary target vector. 1 indicates a major-gift prospect; 0 indicates a standard annual-fund donor.

required

Returns:

Name Type Description
self DonorPropensityModel

Fitted estimator (enables method chaining).

Raises:

Type Description
ValueError

If X and y have incompatible shapes, or if y contains values outside {0, 1}.

Source code in philanthropy/models/_propensity.py
def fit(self, X, y):
    """Fit the DonorPropensityModel to labelled donor data.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Feature matrix.  Accepts NumPy arrays or Pandas DataFrames.
        Common features include RFM metrics, event attendance counts,
        and wealth-screening capacity estimates.
    y : array-like of shape (n_samples,)
        Binary target vector.  ``1`` indicates a major-gift prospect;
        ``0`` indicates a standard annual-fund donor.

    Returns
    -------
    self : DonorPropensityModel
        Fitted estimator (enables method chaining).

    Raises
    ------
    ValueError
        If ``X`` and ``y`` have incompatible shapes, or if ``y``
        contains values outside ``{0, 1}``.
    """
    X, y = validate_data(self, X, y, reset=True)

    self.classes_ = unique_labels(y)
    self.n_features_in_ = X.shape[1]

    self.estimator_ = RandomForestClassifier(
        n_estimators=self.n_estimators,
        max_depth=self.max_depth,
        min_samples_split=self.min_samples_split,
        min_samples_leaf=self.min_samples_leaf,
        min_weight_fraction_leaf=self.min_weight_fraction_leaf,
        class_weight=self.class_weight,
        random_state=self.random_state,
    )
    self.estimator_.fit(X, y)

    return self

predict(X)

Predict binary major-donor labels for each prospect.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Feature matrix. Must have the same number of columns as the data passed to :meth:fit.

required

Returns:

Name Type Description
y_pred ndarray of shape (n_samples,)

Predicted class labels (0 or 1).

Raises:

Type Description
NotFittedError

If :meth:fit has not been called yet.

Source code in philanthropy/models/_propensity.py
def predict(self, X) -> np.ndarray:
    """Predict binary major-donor labels for each prospect.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Feature matrix.  Must have the same number of columns as
        the data passed to :meth:`fit`.

    Returns
    -------
    y_pred : ndarray of shape (n_samples,)
        Predicted class labels (``0`` or ``1``).

    Raises
    ------
    sklearn.exceptions.NotFittedError
        If :meth:`fit` has not been called yet.
    """
    check_is_fitted(self)
    X = validate_data(self, X, reset=False)
    return self.estimator_.predict(X)

predict_proba(X)

Return class-probability estimates for each prospect.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Feature matrix.

required

Returns:

Name Type Description
proba ndarray of shape (n_samples, 2)

Columns are [P(class=0), P(class=1)]. Each row sums to 1.0. The second column is the major-donor positive probability used internally by :meth:predict_affinity_score.

Raises:

Type Description
NotFittedError

If :meth:fit has not been called yet.

Source code in philanthropy/models/_propensity.py
def predict_proba(self, X) -> np.ndarray:
    """Return class-probability estimates for each prospect.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Feature matrix.

    Returns
    -------
    proba : ndarray of shape (n_samples, 2)
        Columns are ``[P(class=0), P(class=1)]``.  Each row sums to
        1.0.  The second column is the major-donor positive probability
        used internally by :meth:`predict_affinity_score`.

    Raises
    ------
    sklearn.exceptions.NotFittedError
        If :meth:`fit` has not been called yet.
    """
    check_is_fitted(self)
    X = validate_data(self, X, reset=False)
    return self.estimator_.predict_proba(X)

decision_function(X)

Raw P(major_donor) scores. Used by sklearn scoring and calibration.

Returns:

Type Description
np.ndarray of shape (n_samples,), dtype float64

Scores for each sample. Centered at 0 for binary case to match predict threshold.

Source code in philanthropy/models/_propensity.py
def decision_function(self, X: np.ndarray) -> np.ndarray:
    """
    Raw P(major_donor) scores. Used by sklearn scoring and calibration.

    Returns
    -------
    np.ndarray of shape (n_samples,), dtype float64
        Scores for each sample. Centered at 0 for binary case to match
        predict threshold.
    """
    check_is_fitted(self)
    X = validate_data(self, X, reset=False)
    proba = self.estimator_.predict_proba(X)
    if proba.shape[1] == 2:
        return proba[:, 1] - 0.5
    elif proba.shape[1] == 1:
        # Single class case: if classes_ is [1], prob of class 1 is all 1.0.
        # If classes_ is [0], prob of class 1 is all 0.0.
        if self.classes_[0] == 1:
            return np.ones(proba.shape[0]) - 0.5
        return np.zeros(proba.shape[0]) - 0.5
    return proba  # Multiclass

predict_affinity_score(X)

Map major-donor probability to a 0–100 affinity score.

This method is the primary interface for gift officers and CRM integrations. The raw predict_proba positive-class probability is linearly rescaled from [0.0, 1.0] to [0, 100] and rounded to two decimal places, making scores directly comparable across fiscal years and prospect cohorts.

Affinity scores are monotonically equivalent to model probabilities, so any rank-ordering derived from probabilities is preserved. Scores do not represent calibrated probabilities and should not be interpreted as the literal odds of a major gift.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Feature matrix. Accepts NumPy arrays or Pandas DataFrames.

required

Returns:

Name Type Description
affinity_scores ndarray of shape (n_samples,)

Float values in the closed interval [0.0, 100.0]. Higher scores indicate stronger major-gift propensity.

Raises:

Type Description
NotFittedError

If :meth:fit has not been called yet.

Examples:

>>> import numpy as np
>>> from philanthropy.datasets import generate_synthetic_donor_data
>>> from philanthropy.models import DonorPropensityModel
>>> df = generate_synthetic_donor_data(500, random_state=7)
>>> X = df[["total_gift_amount", "years_active",
...          "event_attendance_count"]].to_numpy()
>>> y = df["is_major_donor"].to_numpy()
>>> model = DonorPropensityModel(random_state=0).fit(X, y)
>>> scores = model.predict_affinity_score(X)
>>> scores.shape
(500,)
>>> bool((scores >= 0).all() and (scores <= 100).all())
True
Source code in philanthropy/models/_propensity.py
def predict_affinity_score(self, X: np.ndarray) -> np.ndarray:
    """Map major-donor probability to a 0–100 affinity score.

    This method is the primary interface for gift officers and CRM
    integrations.  The raw ``predict_proba`` positive-class probability is
    linearly rescaled from [0.0, 1.0] to [0, 100] and rounded to two
    decimal places, making scores directly comparable across fiscal years
    and prospect cohorts.

    Affinity scores are monotonically equivalent to model probabilities,
    so any rank-ordering derived from probabilities is preserved.  Scores
    do **not** represent calibrated probabilities and should not be
    interpreted as the literal odds of a major gift.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Feature matrix.  Accepts NumPy arrays or Pandas DataFrames.

    Returns
    -------
    affinity_scores : ndarray of shape (n_samples,)
        Float values in the closed interval [0.0, 100.0].  Higher
        scores indicate stronger major-gift propensity.

    Raises
    ------
    sklearn.exceptions.NotFittedError
        If :meth:`fit` has not been called yet.

    Examples
    --------
    >>> import numpy as np
    >>> from philanthropy.datasets import generate_synthetic_donor_data
    >>> from philanthropy.models import DonorPropensityModel
    >>> df = generate_synthetic_donor_data(500, random_state=7)
    >>> X = df[["total_gift_amount", "years_active",
    ...          "event_attendance_count"]].to_numpy()
    >>> y = df["is_major_donor"].to_numpy()
    >>> model = DonorPropensityModel(random_state=0).fit(X, y)
    >>> scores = model.predict_affinity_score(X)
    >>> scores.shape
    (500,)
    >>> bool((scores >= 0).all() and (scores <= 100).all())
    True
    """
    df = self.decision_function(X)
    if df.ndim == 1:
        return np.round((df + 0.5) * 100, 2)
    # Multiclass case: affinity score for "major gift" (usually class 1)
    # We assume class 1 is at index 1 if it exists
    if df.shape[1] > 1:
        return np.round(df[:, 1] * 100, 2)
    return np.round(df.ravel() * 100, 2)

MajorGiftClassifier

Bases: ClassifierMixin, BaseEstimator

Classifies whether a donor is likely to make a major gift using calibrated probabilities.

This uses HistGradientBoostingClassifier to handle missing data natively, and wraps it in a CalibratedClassifierCV so the output probabilities are true calibrated probabilities.

Source code in philanthropy/models/_propensity.py
class MajorGiftClassifier(ClassifierMixin, BaseEstimator):
    """
    Classifies whether a donor is likely to make a major gift using calibrated probabilities.

    This uses HistGradientBoostingClassifier to handle missing data natively, and wraps
    it in a CalibratedClassifierCV so the output probabilities are true calibrated probabilities.
    """
    def __sklearn_tags__(self) -> Tags:
        tags = super().__sklearn_tags__()
        tags.input_tags.allow_nan = True
        tags.classifier_tags.multi_class = True
        return tags

    def __init__(self, max_iter=100, learning_rate=0.1, random_state=None):
        self.max_iter = max_iter
        self.learning_rate = learning_rate
        self.random_state = random_state

    def fit(self, X, y):
        X, y = validate_data(self, X, y, ensure_all_finite="allow-nan", reset=True)
        self.classes_ = unique_labels(y)
        self.n_features_in_ = X.shape[1]

        base_estimator = HistGradientBoostingClassifier(
            max_iter=self.max_iter,
            learning_rate=self.learning_rate,
            random_state=self.random_state
        )
        self.estimator_ = CalibratedClassifierCV(base_estimator)
        self.estimator_.fit(X, y)
        self.n_iter_ = 1
        return self

    def predict(self, X):
        check_is_fitted(self)
        X = validate_data(self, X, ensure_all_finite="allow-nan", reset=False)
        return self.estimator_.predict(X)

    def predict_proba(self, X):
        check_is_fitted(self)
        X = validate_data(self, X, ensure_all_finite="allow-nan", reset=False)
        return self.estimator_.predict_proba(X)

    def predict_affinity_score(self, X):
        proba_positive = self.predict_proba(X)[:, 1]
        return np.round(proba_positive * 100.0, 2)

ShareOfWalletRegressor

Bases: RegressorMixin, BaseEstimator

Predict a donor's total philanthropic capacity (share-of-wallet).

ShareOfWalletRegressor is a scikit-learn–compatible regressor that wraps :class:~sklearn.ensemble.HistGradientBoostingRegressor to estimate a prospect's total philanthropic capacity — i.e., the maximum lifetime gift they could make given their wealth profile, giving history, and engagement signals.

By using HistGradientBoostingRegressor internally, the model handles missing CRM and wealth-screening values natively without requiring an upstream imputation step, reducing pipeline complexity and eliminating one source of potential leakage.

The companion method :meth:predict_capacity_ratio exposes the untapped-capacity ratio (predicted capacity ÷ historical cumulative giving), the primary metric gift officers use to prioritise discovery calls and major-gift portfolios.

Parameters:

Name Type Description Default
learning_rate float

Step size shrinkage applied to each tree. Smaller values require more max_iter trees to converge but typically generalise better.

0.1
max_iter int

Number of boosting iterations (trees). Increase to 300–500 for production models trained on large prospect pools.

100
max_depth int or None

Maximum depth of each individual decision tree.

None
l2_regularization float

L2 regularisation term on leaf weights. Increase (e.g., to 1.0) to combat overfitting when the feature-to-sample ratio is high — a common scenario in small-shop advancement analytics.

0.0
min_samples_leaf int

Minimum number of samples per leaf. Larger values prevent overfitting on sparse major-donor training sets.

20
random_state int or None

Seed for the internal random-number generator. Set to an integer for reproducible model artefacts suitable for audit trails.

None
capacity_floor float

Minimum predicted capacity (in dollars). Predictions are clipped to this floor via np.maximum to prevent negative capacity estimates that are semantically meaningless.

1.0

Attributes:

Name Type Description
estimator_ HistGradientBoostingRegressor

The fitted backend estimator.

n_features_in_ int

Number of features seen during :meth:fit.

Examples:

Predict raw capacity and untapped-capacity ratio:

>>> import numpy as np
>>> from philanthropy.models import ShareOfWalletRegressor
>>> rng = np.random.default_rng(42)
>>> X = rng.uniform(0, 1e6, (200, 6))
>>> y = rng.uniform(5e4, 5e6, 200)
>>> historical = rng.uniform(1e3, 5e5, 200)
>>> model = ShareOfWalletRegressor(random_state=42).fit(X, y)
>>> model.predict(X[:3]).shape
(3,)
>>> ratios = model.predict_capacity_ratio(X[:3], historical_giving=historical[:3])
>>> bool((ratios >= 0).all())
True

Pipeline usage:

>>> from sklearn.pipeline import Pipeline
>>> from philanthropy.preprocessing import WealthScreeningImputer
>>> # WealthScreeningImputer only used here for non-NaN-native context;
>>> # ShareOfWalletRegressor can handle NaN inputs natively.
>>> pipe = Pipeline([("model", ShareOfWalletRegressor(random_state=0))])
>>> _ = pipe.fit(X, y)
Notes

Why HistGradientBoosting? Wealth-screening datasets consistently contain 30–70 % missing values. HistGradientBoostingRegressor implements a native missing-value splitting strategy that treats NaN as an informative category rather than an erroneous artefact, avoiding the information loss of mean/median imputation.

Capacity Ratio Interpretation:

====== ===================================================== Ratio Recommended action ====== ===================================================== ≥ 10× Dramatically under-asked; schedule discovery call. 5–9× Significant untapped potential; major-gift candidate. 2–4× Moderate upside; consider upgrade ask. < 2× Near capacity; focus on retention and stewardship. ====== =====================================================

See Also

philanthropy.models.DonorPropensityModel : Binary propensity model — use alongside this regressor for a two-stage (propensity × capacity) portfolio ranking. philanthropy.preprocessing.WealthScreeningImputer : Optional upstream imputer for non-NaN-native downstream models.

Source code in philanthropy/models/_wallet.py
class ShareOfWalletRegressor(RegressorMixin, BaseEstimator):
    """Predict a donor's total philanthropic capacity (share-of-wallet).

    ``ShareOfWalletRegressor`` is a scikit-learn–compatible regressor that
    wraps :class:`~sklearn.ensemble.HistGradientBoostingRegressor` to estimate
    a prospect's **total philanthropic capacity** — i.e., the maximum lifetime
    gift they *could* make given their wealth profile, giving history, and
    engagement signals.

    By using ``HistGradientBoostingRegressor`` internally, the model handles
    missing CRM and wealth-screening values *natively* without requiring an
    upstream imputation step, reducing pipeline complexity and eliminating one
    source of potential leakage.

    The companion method :meth:`predict_capacity_ratio` exposes the
    **untapped-capacity ratio** (predicted capacity ÷ historical cumulative
    giving), the primary metric gift officers use to prioritise discovery
    calls and major-gift portfolios.

    Parameters
    ----------
    learning_rate : float, default=0.1
        Step size shrinkage applied to each tree.  Smaller values require
        more ``max_iter`` trees to converge but typically generalise better.
    max_iter : int, default=100
        Number of boosting iterations (trees).  Increase to 300–500 for
        production models trained on large prospect pools.
    max_depth : int or None, default=None
        Maximum depth of each individual decision tree.
    l2_regularization : float, default=0.0
        L2 regularisation term on leaf weights.  Increase (e.g., to 1.0)
        to combat overfitting when the feature-to-sample ratio is high —
        a common scenario in small-shop advancement analytics.
    min_samples_leaf : int, default=20
        Minimum number of samples per leaf.  Larger values prevent
        overfitting on sparse major-donor training sets.
    random_state : int or None, default=None
        Seed for the internal random-number generator.  Set to an integer
        for reproducible model artefacts suitable for audit trails.
    capacity_floor : float, default=1.0
        Minimum predicted capacity (in dollars).  Predictions are clipped
        to this floor via ``np.maximum`` to prevent negative capacity
        estimates that are semantically meaningless.

    Attributes
    ----------
    estimator_ : HistGradientBoostingRegressor
        The fitted backend estimator.
    n_features_in_ : int
        Number of features seen during :meth:`fit`.

    Examples
    --------
    **Predict raw capacity and untapped-capacity ratio:**

    >>> import numpy as np
    >>> from philanthropy.models import ShareOfWalletRegressor
    >>> rng = np.random.default_rng(42)
    >>> X = rng.uniform(0, 1e6, (200, 6))
    >>> y = rng.uniform(5e4, 5e6, 200)
    >>> historical = rng.uniform(1e3, 5e5, 200)
    >>> model = ShareOfWalletRegressor(random_state=42).fit(X, y)
    >>> model.predict(X[:3]).shape
    (3,)
    >>> ratios = model.predict_capacity_ratio(X[:3], historical_giving=historical[:3])
    >>> bool((ratios >= 0).all())
    True

    **Pipeline usage:**

    >>> from sklearn.pipeline import Pipeline
    >>> from philanthropy.preprocessing import WealthScreeningImputer
    >>> # WealthScreeningImputer only used here for non-NaN-native context;
    >>> # ShareOfWalletRegressor can handle NaN inputs natively.
    >>> pipe = Pipeline([("model", ShareOfWalletRegressor(random_state=0))])
    >>> _ = pipe.fit(X, y)

    Notes
    -----
    **Why HistGradientBoosting?**
    Wealth-screening datasets consistently contain 30–70 % missing values.
    ``HistGradientBoostingRegressor`` implements a native missing-value
    splitting strategy that treats ``NaN`` as an informative category rather
    than an erroneous artefact, avoiding the information loss of mean/median
    imputation.

    **Capacity Ratio Interpretation:**

    ====== =====================================================
    Ratio  Recommended action
    ====== =====================================================
    ≥ 10×  Dramatically under-asked; schedule discovery call.
    5–9×   Significant untapped potential; major-gift candidate.
    2–4×   Moderate upside; consider upgrade ask.
    < 2×   Near capacity; focus on retention and stewardship.
    ====== =====================================================

    See Also
    --------
    philanthropy.models.DonorPropensityModel :
        Binary propensity model — use alongside this regressor for a
        two-stage (propensity × capacity) portfolio ranking.
    philanthropy.preprocessing.WealthScreeningImputer :
        Optional upstream imputer for non-NaN-native downstream models.
    """

    def __init__(
        self,
        learning_rate: float = 0.1,
        max_iter: int = 100,
        max_depth: Optional[int] = None,
        l2_regularization: float = 0.0,
        min_samples_leaf: int = 20,
        random_state: Optional[int] = None,
        capacity_floor: float = 1.0,
    ) -> None:
        # scikit-learn rule: __init__ stores parameters and does NO logic.
        self.learning_rate = learning_rate
        self.max_iter = max_iter
        self.max_depth = max_depth
        self.l2_regularization = l2_regularization
        self.min_samples_leaf = min_samples_leaf
        self.random_state = random_state
        self.capacity_floor = capacity_floor

    def __sklearn_tags__(self):
        tags = super().__sklearn_tags__()
        tags.input_tags.allow_nan = True
        tags.regressor_tags.poor_score = True
        return tags

    @property
    def n_iter_(self):
        """Number of iterations run by the backend estimator."""
        check_is_fitted(self, ["estimator_"])
        return self.estimator_.n_iter_

    # ------------------------------------------------------------------
    # Public API
    # ------------------------------------------------------------------

    def fit(self, X, y) -> "ShareOfWalletRegressor":
        """Fit the share-of-wallet capacity model to labelled prospect data."""
        X, y = validate_data(self, X, y, ensure_all_finite="allow-nan", reset=True)
        self.n_features_in_ = X.shape[1]

        self.estimator_ = HistGradientBoostingRegressor(
            learning_rate=self.learning_rate,
            max_iter=self.max_iter,
            max_depth=self.max_depth,
            l2_regularization=self.l2_regularization,
            min_samples_leaf=self.min_samples_leaf,
            random_state=self.random_state,
        )
        self.estimator_.fit(X, y)
        return self

    def predict(self, X) -> np.ndarray:
        """Predict philanthropic capacity for each prospect."""
        check_is_fitted(self, ["estimator_"])
        X = validate_data(self, X, ensure_all_finite="allow-nan", reset=False)
        raw = self.estimator_.predict(X)
        return np.maximum(raw, self.capacity_floor)

    def predict_capacity_ratio(
        self,
        X,
        historical_giving: np.ndarray,
    ) -> np.ndarray:
        """Return the predicted capacity-to-historical-giving ratio.

        This ratio is the primary metric for gift officers prioritising
        discovery calls.  A ratio of 5.0 means the model estimates the donor
        could give five times more than they have historically — a strong
        signal of untapped major-gift potential.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Feature matrix passed to :meth:`predict`.  May contain ``NaN``.
        historical_giving : array-like of shape (n_samples,)
            Each donor's **cumulative historical giving** in dollars.  Values
            of zero or negative are replaced with ``1.0`` (the
            ``capacity_floor`` fallback) to avoid division-by-zero errors
            and to ensure semantically valid ratios for new donors with no
            prior giving history.

        Returns
        -------
        capacity_ratio : ndarray of shape (n_samples,)
            Element-wise ratio ``predicted_capacity / max(historical_giving, 1.0)``.
            Values ≥ 1.0 indicate untapped capacity; values < 1.0 indicate
            that the predicted capacity is below current cumulative giving
            (which may signal an over-generous prior record or model noise).

        Raises
        ------
        sklearn.exceptions.NotFittedError
            If :meth:`fit` has not been called yet.
        ValueError
            If ``historical_giving`` length does not match the number of
            rows in ``X``.

        Examples
        --------
        >>> import numpy as np
        >>> from philanthropy.models import ShareOfWalletRegressor
        >>> rng = np.random.default_rng(7)
        >>> X = rng.uniform(0, 1e6, (50, 4))
        >>> y = rng.uniform(1e4, 1e6, 50)
        >>> hist = rng.uniform(500, 1e5, 50)
        >>> model = ShareOfWalletRegressor(random_state=7).fit(X, y)
        >>> ratios = model.predict_capacity_ratio(X, historical_giving=hist)
        >>> ratios.shape
        (50,)
        >>> bool((ratios > 0).all())
        True
        """
        check_is_fitted(self, ["estimator_"])
        predicted_capacity = self.predict(X)

        historical_giving = np.asarray(historical_giving, dtype=float)
        if predicted_capacity.shape[0] != historical_giving.shape[0]:
            raise ValueError(
                f"`historical_giving` must have the same length as the number "
                f"of rows in ``X`` ({predicted_capacity.shape[0]}), "
                f"got {historical_giving.shape[0]}."
            )

        # Clip denominator to prevent division by zero for new/zero-giving donors
        safe_giving = np.maximum(historical_giving, 1.0)
        return predicted_capacity / safe_giving

n_iter_ property

Number of iterations run by the backend estimator.

fit(X, y)

Fit the share-of-wallet capacity model to labelled prospect data.

Source code in philanthropy/models/_wallet.py
def fit(self, X, y) -> "ShareOfWalletRegressor":
    """Fit the share-of-wallet capacity model to labelled prospect data."""
    X, y = validate_data(self, X, y, ensure_all_finite="allow-nan", reset=True)
    self.n_features_in_ = X.shape[1]

    self.estimator_ = HistGradientBoostingRegressor(
        learning_rate=self.learning_rate,
        max_iter=self.max_iter,
        max_depth=self.max_depth,
        l2_regularization=self.l2_regularization,
        min_samples_leaf=self.min_samples_leaf,
        random_state=self.random_state,
    )
    self.estimator_.fit(X, y)
    return self

predict(X)

Predict philanthropic capacity for each prospect.

Source code in philanthropy/models/_wallet.py
def predict(self, X) -> np.ndarray:
    """Predict philanthropic capacity for each prospect."""
    check_is_fitted(self, ["estimator_"])
    X = validate_data(self, X, ensure_all_finite="allow-nan", reset=False)
    raw = self.estimator_.predict(X)
    return np.maximum(raw, self.capacity_floor)

predict_capacity_ratio(X, historical_giving)

Return the predicted capacity-to-historical-giving ratio.

This ratio is the primary metric for gift officers prioritising discovery calls. A ratio of 5.0 means the model estimates the donor could give five times more than they have historically — a strong signal of untapped major-gift potential.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Feature matrix passed to :meth:predict. May contain NaN.

required
historical_giving array-like of shape (n_samples,)

Each donor's cumulative historical giving in dollars. Values of zero or negative are replaced with 1.0 (the capacity_floor fallback) to avoid division-by-zero errors and to ensure semantically valid ratios for new donors with no prior giving history.

required

Returns:

Name Type Description
capacity_ratio ndarray of shape (n_samples,)

Element-wise ratio predicted_capacity / max(historical_giving, 1.0). Values ≥ 1.0 indicate untapped capacity; values < 1.0 indicate that the predicted capacity is below current cumulative giving (which may signal an over-generous prior record or model noise).

Raises:

Type Description
NotFittedError

If :meth:fit has not been called yet.

ValueError

If historical_giving length does not match the number of rows in X.

Examples:

>>> import numpy as np
>>> from philanthropy.models import ShareOfWalletRegressor
>>> rng = np.random.default_rng(7)
>>> X = rng.uniform(0, 1e6, (50, 4))
>>> y = rng.uniform(1e4, 1e6, 50)
>>> hist = rng.uniform(500, 1e5, 50)
>>> model = ShareOfWalletRegressor(random_state=7).fit(X, y)
>>> ratios = model.predict_capacity_ratio(X, historical_giving=hist)
>>> ratios.shape
(50,)
>>> bool((ratios > 0).all())
True
Source code in philanthropy/models/_wallet.py
def predict_capacity_ratio(
    self,
    X,
    historical_giving: np.ndarray,
) -> np.ndarray:
    """Return the predicted capacity-to-historical-giving ratio.

    This ratio is the primary metric for gift officers prioritising
    discovery calls.  A ratio of 5.0 means the model estimates the donor
    could give five times more than they have historically — a strong
    signal of untapped major-gift potential.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Feature matrix passed to :meth:`predict`.  May contain ``NaN``.
    historical_giving : array-like of shape (n_samples,)
        Each donor's **cumulative historical giving** in dollars.  Values
        of zero or negative are replaced with ``1.0`` (the
        ``capacity_floor`` fallback) to avoid division-by-zero errors
        and to ensure semantically valid ratios for new donors with no
        prior giving history.

    Returns
    -------
    capacity_ratio : ndarray of shape (n_samples,)
        Element-wise ratio ``predicted_capacity / max(historical_giving, 1.0)``.
        Values ≥ 1.0 indicate untapped capacity; values < 1.0 indicate
        that the predicted capacity is below current cumulative giving
        (which may signal an over-generous prior record or model noise).

    Raises
    ------
    sklearn.exceptions.NotFittedError
        If :meth:`fit` has not been called yet.
    ValueError
        If ``historical_giving`` length does not match the number of
        rows in ``X``.

    Examples
    --------
    >>> import numpy as np
    >>> from philanthropy.models import ShareOfWalletRegressor
    >>> rng = np.random.default_rng(7)
    >>> X = rng.uniform(0, 1e6, (50, 4))
    >>> y = rng.uniform(1e4, 1e6, 50)
    >>> hist = rng.uniform(500, 1e5, 50)
    >>> model = ShareOfWalletRegressor(random_state=7).fit(X, y)
    >>> ratios = model.predict_capacity_ratio(X, historical_giving=hist)
    >>> ratios.shape
    (50,)
    >>> bool((ratios > 0).all())
    True
    """
    check_is_fitted(self, ["estimator_"])
    predicted_capacity = self.predict(X)

    historical_giving = np.asarray(historical_giving, dtype=float)
    if predicted_capacity.shape[0] != historical_giving.shape[0]:
        raise ValueError(
            f"`historical_giving` must have the same length as the number "
            f"of rows in ``X`` ({predicted_capacity.shape[0]}), "
            f"got {historical_giving.shape[0]}."
        )

    # Clip denominator to prevent division by zero for new/zero-giving donors
    safe_giving = np.maximum(historical_giving, 1.0)
    return predicted_capacity / safe_giving

MovesManagementClassifier

Bases: ClassifierMixin, BaseEstimator

Predicts the next best moves management stage for a donor.

Source code in philanthropy/models/_moves.py
class MovesManagementClassifier(ClassifierMixin, BaseEstimator):
    """
    Predicts the next best moves management stage for a donor.
    """

    def __init__(
        self,
        learning_rate: float = 0.1,
        max_iter: int = 200,
        class_weight: str | dict | None = "balanced",
        random_state: int | None = None,
    ):
        self.learning_rate = learning_rate
        self.max_iter = max_iter
        self.class_weight = class_weight
        self.random_state = random_state

    def fit(self, X, y):
        X, y = validate_data(self, X, y, reset=True)
        if hasattr(X, "columns"):
            self.feature_names_in_ = np.array(X.columns.tolist(), dtype=object)
        self.n_features_in_ = X.shape[1]

        self.label_encoder_ = LabelEncoder()
        y_encoded = self.label_encoder_.fit_transform(y)
        self.classes_ = self.label_encoder_.classes_

        self.estimator_ = HistGradientBoostingClassifier(
            learning_rate=self.learning_rate,
            max_iter=self.max_iter,
            class_weight=self.class_weight,
            random_state=self.random_state,
        )
        self.estimator_.fit(X, y_encoded)
        return self

    def predict(self, X):
        check_is_fitted(self)
        X = validate_data(self, X, reset=False)
        y_pred = self.estimator_.predict(X)
        return self.label_encoder_.inverse_transform(y_pred)

    def predict_proba(self, X):
        check_is_fitted(self)
        X = validate_data(self, X, reset=False)
        return self.estimator_.predict_proba(X)

    def predict_action_priority(self, X) -> dict:
        check_is_fitted(self)
        X = validate_data(self, X, reset=False)

        probas = self.estimator_.predict_proba(X)
        pred_idx = np.argmax(probas, axis=1)
        confidences = np.max(probas, axis=1)

        stages = self.label_encoder_.inverse_transform(pred_idx)

        unique_stages, counts = np.unique(stages, return_counts=True)
        portfolio_summary = dict(zip(unique_stages, counts))

        return {
            "stage": stages,
            "confidence": confidences,
            "portfolio_summary": portfolio_summary,
        }

LapsePredictor

Bases: ClassifierMixin, BaseEstimator

Predicts whether a donor will lapse within a configurable window. Uses RandomForestClassifier backend.

Parameters:

Name Type Description Default
n_estimators int

Number of trees in the RandomForestClassifier.

100
lapse_window_years int

Documentation parameter: the time window over which lapse is defined.

2
max_depth int or None

Maximum depth of trees. None means nodes expand until pure.

None
class_weight (dict, 'balanced', 'balanced_subsample' or None)

Class weights for imbalanced lapse prediction.

None
random_state int or None

Random seed for reproducibility.

None
Source code in philanthropy/models/_lapse.py
class LapsePredictor(ClassifierMixin, BaseEstimator):
    """
    Predicts whether a donor will lapse within a configurable window.
    Uses RandomForestClassifier backend.

    Parameters
    ----------
    n_estimators : int, default=100
        Number of trees in the RandomForestClassifier.
    lapse_window_years : int, default=2
        Documentation parameter: the time window over which lapse is defined.
    max_depth : int or None, default=None
        Maximum depth of trees. None means nodes expand until pure.
    class_weight : dict, "balanced", "balanced_subsample" or None, default=None
        Class weights for imbalanced lapse prediction.
    random_state : int or None, default=None
        Random seed for reproducibility.
    """

    def __sklearn_tags__(self) -> Tags:
        tags = super().__sklearn_tags__()
        tags.input_tags.allow_nan = True
        tags.classifier_tags.poor_score = True
        return tags

    def __init__(
        self,
        n_estimators: int = 100,
        lapse_window_years: int = 2,
        max_depth: int | None = None,
        class_weight=None,
        random_state: int | None = None,
    ):
        self.n_estimators = n_estimators
        self.lapse_window_years = lapse_window_years
        self.max_depth = max_depth
        self.class_weight = class_weight
        self.random_state = random_state

    def fit(self, X, y) -> "LapsePredictor":
        """Fit the LapsePredictor.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)
            Feature matrix.
        y : array-like of shape (n_samples,)
            Binary target: 1 = lapse, 0 = no lapse.

        Returns
        -------
        self : LapsePredictor
        """
        X, y = check_X_y(X, y, ensure_all_finite="allow-nan")
        self.classes_ = np.unique(y)
        self.n_features_in_ = X.shape[1]

        self.estimator_ = RandomForestClassifier(
            n_estimators=self.n_estimators,
            max_depth=self.max_depth,
            class_weight=self.class_weight,
            random_state=self.random_state,
        )
        self.estimator_.fit(X, y)
        return self

    def predict(self, X) -> np.ndarray:
        """Predict binary lapse labels."""
        check_is_fitted(self)
        X = check_array(X, ensure_all_finite="allow-nan")
        return self.estimator_.predict(X)

    def predict_proba(self, X) -> np.ndarray:
        """Return class probabilities of shape (n_samples, 2)."""
        check_is_fitted(self)
        X = check_array(X, ensure_all_finite="allow-nan")
        return self.estimator_.predict_proba(X)

    def predict_lapse_score(self, X) -> np.ndarray:
        """Return P(lapse) × 100 rounded to 2 decimal places (0–100 scale)."""
        check_is_fitted(self)
        X = check_array(X, ensure_all_finite="allow-nan")
        # Column 1 is P(class=1), i.e. P(lapse) when classes_ is [0, 1]
        proba_lapse = self.predict_proba(X)[:, 1]
        return np.round(proba_lapse * 100.0, 2)

fit(X, y)

Fit the LapsePredictor.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)

Feature matrix.

required
y array-like of shape (n_samples,)

Binary target: 1 = lapse, 0 = no lapse.

required

Returns:

Name Type Description
self LapsePredictor
Source code in philanthropy/models/_lapse.py
def fit(self, X, y) -> "LapsePredictor":
    """Fit the LapsePredictor.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)
        Feature matrix.
    y : array-like of shape (n_samples,)
        Binary target: 1 = lapse, 0 = no lapse.

    Returns
    -------
    self : LapsePredictor
    """
    X, y = check_X_y(X, y, ensure_all_finite="allow-nan")
    self.classes_ = np.unique(y)
    self.n_features_in_ = X.shape[1]

    self.estimator_ = RandomForestClassifier(
        n_estimators=self.n_estimators,
        max_depth=self.max_depth,
        class_weight=self.class_weight,
        random_state=self.random_state,
    )
    self.estimator_.fit(X, y)
    return self

predict(X)

Predict binary lapse labels.

Source code in philanthropy/models/_lapse.py
def predict(self, X) -> np.ndarray:
    """Predict binary lapse labels."""
    check_is_fitted(self)
    X = check_array(X, ensure_all_finite="allow-nan")
    return self.estimator_.predict(X)

predict_proba(X)

Return class probabilities of shape (n_samples, 2).

Source code in philanthropy/models/_lapse.py
def predict_proba(self, X) -> np.ndarray:
    """Return class probabilities of shape (n_samples, 2)."""
    check_is_fitted(self)
    X = check_array(X, ensure_all_finite="allow-nan")
    return self.estimator_.predict_proba(X)

predict_lapse_score(X)

Return P(lapse) × 100 rounded to 2 decimal places (0–100 scale).

Source code in philanthropy/models/_lapse.py
def predict_lapse_score(self, X) -> np.ndarray:
    """Return P(lapse) × 100 rounded to 2 decimal places (0–100 scale)."""
    check_is_fitted(self)
    X = check_array(X, ensure_all_finite="allow-nan")
    # Column 1 is P(class=1), i.e. P(lapse) when classes_ is [0, 1]
    proba_lapse = self.predict_proba(X)[:, 1]
    return np.round(proba_lapse * 100.0, 2)

PlannedGivingIntentScorer

Bases: ClassifierMixin, BaseEstimator

Predicts bequest/planned giving intent. Wraps GradientBoostingClassifier with CalibratedClassifierCV.

Exposes .predict_bequest_intent_score(X) returning a 0-100 float array.

Parameters:

Name Type Description Default
n_estimators int

The number of boosting stages to perform.

100
random_state int, RandomState instance or None

Controls the randomness of the estimator.

None
Source code in philanthropy/models/_planned_giving.py
class PlannedGivingIntentScorer(ClassifierMixin, BaseEstimator):
    """
    Predicts bequest/planned giving intent. Wraps GradientBoostingClassifier
    with CalibratedClassifierCV.

    Exposes `.predict_bequest_intent_score(X)` returning a 0-100 float array.

    Parameters
    ----------
    n_estimators : int, default=100
        The number of boosting stages to perform.
    random_state : int, RandomState instance or None, default=None
        Controls the randomness of the estimator.
    """

    def __init__(
        self,
        n_estimators: int = 100,
        random_state: int | None = None,
    ):
        self.n_estimators = n_estimators
        self.random_state = random_state

    def fit(self, X, y) -> "PlannedGivingIntentScorer":
        X, y = validate_data(self, X, y, reset=True)

        self.classes_ = np.unique(y)
        self.n_features_in_ = X.shape[1]

        base_estimator = GradientBoostingClassifier(
            n_estimators=self.n_estimators,
            random_state=self.random_state
        )
        self.estimator_ = CalibratedClassifierCV(
            estimator=base_estimator,
            method="sigmoid",
            cv=2,
        )
        self.estimator_.fit(X, y)
        return self

    def predict(self, X) -> np.ndarray:
        check_is_fitted(self)
        X = validate_data(self, X, reset=False)
        return self.estimator_.predict(X)

    def predict_proba(self, X) -> np.ndarray:
        check_is_fitted(self)
        X = validate_data(self, X, reset=False)
        return self.estimator_.predict_proba(X)

    def predict_bequest_intent_score(self, X) -> np.ndarray:
        """
        Return the 0-100 float score of bequest intent.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)

        Returns
        -------
        scores : ndarray of shape (n_samples,)
        """
        return self.predict_intent_score(X)

    def predict_intent_score(self, X) -> np.ndarray:
        """
        Return P(planned giving intent) × 100, rounded to 2 decimal places.

        Parameters
        ----------
        X : array-like of shape (n_samples, n_features)

        Returns
        -------
        scores : ndarray of shape (n_samples,)
            Values in range [0.0, 100.0].
        """
        proba = self.predict_proba(X)
        if proba.shape[1] < 2:
            scores = np.zeros(proba.shape[0], dtype=float)
        else:
            scores = np.round(proba[:, 1] * 100.0, 2)
        return scores

    def __sklearn_tags__(self):
        tags = super().__sklearn_tags__()
        return tags

predict_bequest_intent_score(X)

Return the 0-100 float score of bequest intent.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)
required

Returns:

Name Type Description
scores ndarray of shape (n_samples,)
Source code in philanthropy/models/_planned_giving.py
def predict_bequest_intent_score(self, X) -> np.ndarray:
    """
    Return the 0-100 float score of bequest intent.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)

    Returns
    -------
    scores : ndarray of shape (n_samples,)
    """
    return self.predict_intent_score(X)

predict_intent_score(X)

Return P(planned giving intent) × 100, rounded to 2 decimal places.

Parameters:

Name Type Description Default
X array-like of shape (n_samples, n_features)
required

Returns:

Name Type Description
scores ndarray of shape (n_samples,)

Values in range [0.0, 100.0].

Source code in philanthropy/models/_planned_giving.py
def predict_intent_score(self, X) -> np.ndarray:
    """
    Return P(planned giving intent) × 100, rounded to 2 decimal places.

    Parameters
    ----------
    X : array-like of shape (n_samples, n_features)

    Returns
    -------
    scores : ndarray of shape (n_samples,)
        Values in range [0.0, 100.0].
    """
    proba = self.predict_proba(X)
    if proba.shape[1] < 2:
        scores = np.zeros(proba.shape[0], dtype=float)
    else:
        scores = np.round(proba[:, 1] * 100.0, 2)
    return scores