gklearn.utils.model_selection_precomputed

compute_gram_matrices(dataset, y, estimator, param_list_precomputed, results_dir, ds_name, n_jobs=1, str_fw='', verbose=True)[source]
model_selection_for_precomputed_kernel(datafile, estimator, param_grid_precomputed, param_grid, model_type, NUM_TRIALS=30, datafile_y=None, extra_params=None, ds_name='ds-unknown', n_jobs=1, read_gm_from_file=False, verbose=True)[source]

Perform model selection, fitting and testing for precomputed kernels using nested CV. Print out neccessary data during the process then finally the results.

datafile : string
Path of dataset file.
estimator : function
kernel function used to estimate. This function needs to return a gram matrix.
param_grid_precomputed : dictionary
Dictionary with names (string) of parameters used to calculate gram matrices as keys and lists of parameter settings to try as values. This enables searching over any sequence of parameter settings. Params with length 1 will be omitted.
param_grid : dictionary
Dictionary with names (string) of parameters used as penelties as keys and lists of parameter settings to try as values. This enables searching over any sequence of parameter settings. Params with length 1 will be omitted.
model_type : string
Type of the problem, can be ‘regression’ or ‘classification’.
NUM_TRIALS : integer
Number of random trials of outer cv loop. The default is 30.
datafile_y : string
Path of file storing y data. This parameter is optional depending on the given dataset file.
extra_params : dict
Extra parameters for loading dataset. See function gklearn.utils. graphfiles.loadDataset for detail.
ds_name : string
Name of the dataset.
n_jobs : int
Number of jobs for parallelization.
read_gm_from_file : boolean
Whether gram matrices are loaded from a file.
>>> import numpy as np
>>> from gklearn.utils.model_selection_precomputed import model_selection_for_precomputed_kernel
>>> from gklearn.kernels.untilHPathKernel import untilhpathkernel
>>>
>>> datafile = '../datasets/MUTAG/MUTAG_A.txt'
>>> estimator = untilhpathkernel
>>> param_grid_precomputed = {’depth’:  np.linspace(1, 10, 10), ’k_func’:
        [’MinMax’, ’tanimoto’], ’compute_method’:  [’trie’]}
>>> # ’C’ for classification problems and ’alpha’ for regression problems.
>>> param_grid = [{’C’: np.logspace(-10, 10, num=41, base=10)}, {’alpha’:
        np.logspace(-10, 10, num=41, base=10)}]
>>>
>>> model_selection_for_precomputed_kernel(datafile, estimator, 
        param_grid_precomputed, param_grid[0], 'classification', ds_name=’MUTAG’)
parallel_trial_do(param_list_pre_revised, param_list, y, model_type, trial)[source]
printResultsInTable(param_list, param_list_pre_revised, average_val_scores, std_val_scores, average_perf_scores, std_perf_scores, average_train_scores, std_train_scores, gram_matrix_time, model_type, verbose)[source]
read_gram_matrices_from_file(results_dir, ds_name)[source]
trial_do(param_list_pre_revised, param_list, gram_matrices, y, model_type, trial)[source]