Design Principles
PhilanthroPy was built from the ground up to solve structural problems frequently found in non-profit data science.
- Leakage-Safe by Design: Temporal data leakage across fiscal years guarantees a model that performs flawlessly in backtests but fails in production. PhilanthroPy's transformers and
TemporalDonorSplitteranchor cross-validation splits chronologically to the organization's fiscal-calendar to simulate realistic "walk-forward" predictions. - Idempotent Transformers: Fill statistics, encounter summaries, and imputation snapshots are firmly frozen at
fit()time. Callingtransform()multiple times on streaming data will continuously yield identical, non-leaking transformations. - Scikit-Learn Native: All estimators have been strictly tested against scikit-learn's
check_estimatorstandard. They robustly support scikit-learn features likeset_output(transform="pandas"), cloning, and cross-validation pipelines out of the box. - NaN-Transparent: Real-world CRM data is fraught with empty and irregular fields. Third party database imports may miss up to 60% of their values. PhilanthroPy transforms operate with
allow_nan = True, preventing silent data loss and correctly extracting signals from "missingness" itself. - PII-Aware: Features like the
CRMCleanerandGratefulPatientFeaturizeractively decouple clinical intensity and demographic logic from explicit Protected Health Information (PHI) such as Medical Record Numbers (MRNs) and Social Security Identifiers, minimizing compliance risks in analytical pipelines.