Datasets Reference
philanthropy.datasets
Synthetic data generators for donor analytics development and testing.
generate_synthetic_donor_data(n_samples=1000, random_state=None)
Generate a realistic synthetic donor DataFrame for modelling and testing.
The returned dataset simulates a hospital's major-gifts prospect pool. Features are correlated in a domain-meaningful way:
- Donors with more
years_activeand higherevent_attendance_counthave a monotonically increasing probability of being labelled as a major donor (is_major_donor = 1). total_gift_amountis log-normally distributed and positively correlated withis_major_donor.last_gift_dateis sampled uniformly across the past five calendar years, with major donors skewed toward more recent activity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_samples
|
int
|
Number of synthetic donor records to generate. |
1000
|
random_state
|
int or None
|
Seed for the NumPy random-number generator. Pass an integer to
obtain a reproducible dataset; |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
df |
pd.DataFrame of shape (n_samples, 5)
|
A DataFrame with the following columns:
|
Examples:
>>> from philanthropy.datasets import generate_synthetic_donor_data
>>> df = generate_synthetic_donor_data(n_samples=500, random_state=42)
>>> df.shape
(500, 5)
>>> df.dtypes["is_major_donor"]
dtype('int64')
>>> bool(df["is_major_donor"].isin([0, 1]).all())
True
Notes
The underlying propensity model is a logistic function of a linear
score z constructed from years_active, event_attendance_count,
and a small amount of Gaussian noise. This ensures the label is
statistically learnable—neither trivially predictable nor random.
The function never raises an error for valid inputs. Passing
n_samples=0 returns an empty DataFrame with the correct column
schema.
Source code in philanthropy/datasets/_generator.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 | |