Module auton_survival.phenotyping

Utilities to phenotype individuals based on similar survival characteristics.

Functions

def random()

random() -> x in the interval [0, 1).

Classes

class Phenotyper (random_seed=0)

Base class for all phenotyping methods.

class IntersectionalPhenotyper (cat_vars=None, num_vars=None, num_vars_quantiles=(0, 0.5, 1.0), random_seed=0)

A phenotyper that phenotypes by performing an exhaustive cartesian product on prespecified set of categorical and numerical variables.

Parameters

cat_vars : list of python str(s), default=None
List of column names of categorical variables to phenotype on.
num_vars : list of python str(s), default=None
List of column names of continuous variables to phenotype on.
num_vars_quantiles : tuple of floats, default=(0, .5, 1.0)
A tuple of quantiles as floats (inclusive of 0 and 1) used to discretize continuous variables into equal-sized bins.
features : pd.DataFrame
A pandas dataframe with rows corresponding to individual samples and columns as covariates.
phenotypes : list
List of lists containing all possible combinations of specified categorical and numerical variable values.

Methods

def fit(self, features)

Fit the phenotyper by finding all possible intersectional groups on a passed set of features.

Parameters

features : pd.DataFrame
A pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

Trained instance of intersectional phenotyper.

def predict(self, features)

Phenotype out of sample test data.

Parameters

features : pd.DataFrame
a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array:
a numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.
def fit_predict(self, features)

Fit and perform phenotyping on a given dataset.

Parameters

features : pd.DataFrame
A pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array:
A numpy array containing a list of strings that define subgroups from all possible combinations of specified categorical and numerical variables.
class ClusteringPhenotyper (clustering_method='kmeans', dim_red_method=None, random_seed=0, **kwargs)

Phenotyper that performs dimensionality reduction followed by clustering. Learned clusters are considered phenotypes and used to group samples based on similarity in the covariate space.

Parameters

features : pd.DataFrame
A pandas dataframe with rows corresponding to individual samples and columns as covariates.
clustering_method : str, default='kmeans'

The clustering method applied for phenotyping. Options include:

  • kmeans: K-Means Clustering
  • dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN)
  • gmm: Gaussian Mixture
  • hierarchical: Agglomerative Clustering
dim_red_method : str, default=None

The dimensionality reductions method applied. Options include:

  • pca : Principal Component Analysis
  • kpca : Kernel Principal Component Analysis
  • nnmf : Non-Negative Matrix Factorization
  • None : dimensionality reduction is not applied.
random_seed : int, default=0
Controls the randomness and reproducibility of called functions
kwargs : dict

Additional arguments for dimensionality reduction and clustering Please include dictionary key and item pairs specified by the following scikit-learn modules:

  • pca : sklearn.decomposition.PCA
  • nnmf : sklearn.decomposition.NMF
  • kpca : sklearn.decomposition.KernelPCA
  • kmeans : sklearn.cluster.KMeans
  • dbscan : sklearn.cluster.DBSCAN
  • gmm : sklearn.mixture.GaussianMixture
  • hierarchical : sklearn.cluster.AgglomerativeClustering

Methods

def fit(self, features)

Perform dimensionality reduction and train an instance of the clustering algorithm.

Parameters

features : pd.DataFrame
a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

Trained instance of clustering phenotyper.

def predict_proba(self, features)

Peform dimensionality reduction, clustering, and estimate probability estimates of sample association to learned clusters, or subgroups.

Parameters

features : pd.DataFrame
a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array:
a numpy array of the probability estimates of sample association to learned subgroups.
def predict(self, features)

Peform dimensionality reduction, clustering, and extract phenogroups that maximize the probability estimates of sample association to specific learned clusters, or subgroups.

Parameters

features : pd.DataFrame
a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array:
a numpy array of phenogroup labels
def fit_predict(self, features)

Fit and perform phenotyping on a given dataset.

Parameters

features : pd.DataFrame
a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array
a numpy array of the probability estimates of sample association to learned clusters.
class SurvivalVirtualTwinsPhenotyper (cf_method='dcph', phenotyping_method='rfr', cf_hyperparams=None, random_seed=0, **phenotyper_hyperparams)

Phenotyper that estimates the potential outcomes under treatment and control using a counterfactual Deep Cox Proportional Hazards model, followed by regressing the difference of the estimated counterfactual Restricted Mean Survival Times using a Random Forest regressor.

Methods

def fit(self, features, outcomes, interventions, horizons, metric)

Fit a counterfactual model and regress the difference of the estimated counterfactual Restricted Mean Survival Time using a Random Forest regressor.

Parameters

features : pd.DataFrame
A pandas dataframe with rows corresponding to individual samples and columns as covariates.
outcomes : pd.DataFrame
A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
interventions : np.array
Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
horizons : int or float or list
Event-horizons at which to evaluate model performance.
metric : str, default='ibs'
Metric used to evaluate model performance and tune hyperparameters. Options include: - 'auc': Dynamic area under the ROC curve - 'brs' : Brier Score - 'ibs' : Integrated Brier Score - 'ctd' : Concordance Index

Returns

Trained instance of Survival Virtual Twins Phenotyer.

def predict_proba(self, features)

Estimate the probability that the Restrictred Mean Survival Time under the Treatment group is greater than that under the control group.

Parameters

features : pd.DataFrame
a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array
a numpy array of the phenogroup probabilties in the format [control_group, treated_group].
def predict(self, features)

Extract phenogroups that maximize probability estimates.

Parameters

features : pd.DataFrame
a pandas dataframe with rows corresponding to individual samples and columns as covariates.

Returns

np.array
a numpy array of the phenogroup labels
def fit_predict(self, features, outcomes, interventions, horizon)

Fit and perform phenotyping on a given dataset.

Parameters

features : pd.DataFrame
A pandas dataframe with rows corresponding to individual samples and columns as covariates.
outcomes : pd.DataFrame
A pandas dataframe with rows corresponding to individual samples and columns 'time' and 'event'.
treatment_indicator : np.array
Boolean numpy array of treatment indicators. True means individual was assigned a specific treatment.
horizon : np.float
The event horizon at which to compute the counterfacutal RMST for regression.

Returns

np.array
a numpy array of the phenogroup labels.