gklearn.utils.model_selection_precomputed

compute_gram_matrices(dataset, y, estimator, param_list_precomputed, output_dir, ds_name, n_jobs=1, str_fw='', verbose=True)[source]

model_selection_for_precomputed_kernel(datafile, estimator, param_grid_precomputed, param_grid, model_type, NUM_TRIALS=30, datafile_y=None, extra_params=None, ds_name='ds-unknown', output_dir='outputs/', n_jobs=1, read_gm_from_file=False, verbose=True)[source]

Perform model selection, fitting and testing for precomputed kernels using nested CV. Print out neccessary data during the process then finally the results.

Parameters

datafilestring: Path of dataset file.
estimatorfunction: kernel function used to estimate. This function needs to return a gram matrix.
param_grid_precomputeddictionary: Dictionary with names (string) of parameters used to calculate gram matrices as keys and lists of parameter settings to try as values. This enables searching over any sequence of parameter settings. Params with length 1 will be omitted.
param_griddictionary: Dictionary with names (string) of parameters used as penelties as keys and lists of parameter settings to try as values. This enables searching over any sequence of parameter settings. Params with length 1 will be omitted.
model_typestring: Type of the problem, can be ‘regression’ or ‘classification’.
NUM_TRIALSinteger: Number of random trials of the outer CV loop. The default is 30.
datafile_ystring: Path of file storing y data. This parameter is optional depending on the given dataset file.
extra_paramsdict: Extra parameters for loading dataset. See function gklearn.utils. graphfiles.loadDataset for detail.
ds_namestring: Name of the dataset.
n_jobsint: Number of jobs for parallelization.
read_gm_from_fileboolean: Whether gram matrices are loaded from a file.

Examples

>>> import numpy as np
>>> from gklearn.utils.model_selection_precomputed import model_selection_for_precomputed_kernel
>>> from gklearn.kernels.untilHPathKernel import untilhpathkernel
>>>
>>> datafile = '../datasets/MUTAG/MUTAG_A.txt'
>>> estimator = untilhpathkernel
>>> param_grid_precomputed = {’depth’:  np.linspace(1, 10, 10), ’k_func’:
                [’MinMax’, ’tanimoto’], ’compute_method’:  [’trie’]}
>>> # ’C’ for classification problems and ’alpha’ for regression problems.
>>> param_grid = [{’C’: np.logspace(-10, 10, num=41, base=10)}, {’alpha’:
                np.logspace(-10, 10, num=41, base=10)}]
>>>
>>> model_selection_for_precomputed_kernel(datafile, estimator, 
                param_grid_precomputed, param_grid[0], 'classification', ds_name=’MUTAG’)

parallel_trial_do(param_list_pre_revised, param_list, y, model_type, trial)[source]

printResultsInTable(param_list, param_list_pre_revised, average_val_scores, std_val_scores, average_perf_scores, std_perf_scores, average_train_scores, std_train_scores, gram_matrix_time, model_type, verbose)[source]

read_gram_matrices_from_file(output_dir, ds_name)[source]

trial_do(param_list_pre_revised, param_list, gram_matrices, y, model_type, trial)[source]