gklearn.utils.model_selection_precomputed

compute_gram_matrices(dataset, y, estimator, param_list_precomputed, output_dir, ds_name, n_jobs=1, str_fw='', verbose=True)[source]
model_selection_for_precomputed_kernel(datafile, estimator, param_grid_precomputed, param_grid, model_type, NUM_TRIALS=30, datafile_y=None, extra_params=None, ds_name='ds-unknown', output_dir='outputs/', n_jobs=1, read_gm_from_file=False, verbose=True)[source]

Perform model selection, fitting and testing for precomputed kernels using nested CV. Print out neccessary data during the process then finally the results.

Parameters

datafilestring

Path of dataset file.

estimatorfunction

kernel function used to estimate. This function needs to return a gram matrix.

param_grid_precomputeddictionary

Dictionary with names (string) of parameters used to calculate gram matrices as keys and lists of parameter settings to try as values. This enables searching over any sequence of parameter settings. Params with length 1 will be omitted.

param_griddictionary

Dictionary with names (string) of parameters used as penelties as keys and lists of parameter settings to try as values. This enables searching over any sequence of parameter settings. Params with length 1 will be omitted.

model_typestring

Type of the problem, can be ‘regression’ or ‘classification’.

NUM_TRIALSinteger

Number of random trials of the outer CV loop. The default is 30.

datafile_ystring

Path of file storing y data. This parameter is optional depending on the given dataset file.

extra_paramsdict

Extra parameters for loading dataset. See function gklearn.utils. graphfiles.loadDataset for detail.

ds_namestring

Name of the dataset.

n_jobsint

Number of jobs for parallelization.

read_gm_from_fileboolean

Whether gram matrices are loaded from a file.

Examples

>>> import numpy as np
>>> from gklearn.utils.model_selection_precomputed import model_selection_for_precomputed_kernel
>>> from gklearn.kernels.untilHPathKernel import untilhpathkernel
>>>
>>> datafile = '../datasets/MUTAG/MUTAG_A.txt'
>>> estimator = untilhpathkernel
>>> param_grid_precomputed = {’depth’:  np.linspace(1, 10, 10), ’k_func’:
                [’MinMax’, ’tanimoto’], ’compute_method’:  [’trie’]}
>>> # ’C’ for classification problems and ’alpha’ for regression problems.
>>> param_grid = [{’C’: np.logspace(-10, 10, num=41, base=10)}, {’alpha’:
                np.logspace(-10, 10, num=41, base=10)}]
>>>
>>> model_selection_for_precomputed_kernel(datafile, estimator, 
                param_grid_precomputed, param_grid[0], 'classification', ds_name=’MUTAG’)
parallel_trial_do(param_list_pre_revised, param_list, y, model_type, trial)[source]
printResultsInTable(param_list, param_list_pre_revised, average_val_scores, std_val_scores, average_perf_scores, std_perf_scores, average_train_scores, std_train_scores, gram_matrix_time, model_type, verbose)[source]
read_gram_matrices_from_file(output_dir, ds_name)[source]
trial_do(param_list_pre_revised, param_list, gram_matrices, y, model_type, trial)[source]