% VL_SVMTRAIN Train a Support Vector Machine % [W B] = VL_SVMTRAIN(X, Y, LAMBDA) trains a linear Support Vector % Machine (SVM) from the data vectors X and the labels Y. X is a D % by N matrix, with one column per example and D feature dimensions % (SINGLE or DOUBLE). Y is a DOUBLE vector with N elements with a % binary (-1 or +1) label for each training point. To a first order % approximation, the function computes a weight vector W and offset % B such that the score W'*X(:,i)+B has the same sign of LABELS(i) % for all i. % % VL_SVMTRAIN(DATASET, LABELS, LAMBDA) takes as input a DATASET % structure, which allows more sophisticated input formats to be % supported (see VL_SVMDATASET()). % % [W, B, INFO] = VL_SVMTRAIN(...) additionally returns a structure % INFO with the following fields: % % iteration:: % Number of iterations performed. % % epoch:: % Number of iterations over number of training data points. % % elapsedTime:: % Time elapsed since the start of training. % % objective:: % SVM objective value. % % regularizer:: % Regularizer value. % % loss:: % Loss value. % % scoreVariation:: [SGD only] % Mean square root of the difference between the last two % values of the SVM scores for each point. % % dualObjective:: [SDCA only] % Dual objective value. % % dualLoss:: [SDCA only] % Dual loss value:: % % dualityGap:: [SDCA only] % Difference between the objective and the dual objective. % % [W, B, INFO, SCORES] = VL_SVMTRAIN(X, Y, LABMDA) returns a row % vector of the SVM score for each training point. This can be used % in combination with the options SOLVER, MODEL, and BIAS to % evaluate an existing SVM on new data points. Furthermore INFO will % contain the corresponding SVM loss, regularizer, and objective % function value. If this information is not of interest, it is % possible to pass a null vector Y instead of the actual labels as % well as a null regularizer. % % VL_SVMTRAIN() accepts the following options: % % Verbose:: % Specify one or multiple times to increase the verbosity level. % Given only once, produces messages at the beginning and end of % the learning. Verbosity of at least 2 prints information at % every diagnostic step. % % Epsilon:: 1e-3 % Tolerance for the stopping criterion. % % MaxNumIterations:: 10/LAMBDA % Maximum number of iterations. % % BiasMultiplier:: 1 % Value of the constant B0 used as bias term (see below). % % BiasLearningRate:: 0.5 % Learning rate for the bias (SGD solver only). % % DiagnosticFunction:: [] % Diagnostic function callback. The callback takes the INFO % structure as only argument. To trace energies and plot graphs, % the callback can update a global variable or, preferably, be % defined as a nested function and update a local variable in the % parent function. % % DiagnosticFrequency:: Number of data points % After how many iteration the diagnostic is run. This step check % for convergence, and is done rarely, typically after each epoch % (pass over the data). It also calls the DiangosticFunction, % if any is specified. % % Loss:: HINGE % Loss function. One of HINGE, HINGE2, L1, L2, LOGISTIC. % % Solver:: SDCA % One of SGD (stochastic gradient descent [1]), SDCA (stochastic % dual coordinate ascent [2,3]), or NONE (no training). The % last option can be used in combination with the options MODEL % and BIAS to evaluate an existing SVM. % % Model:: null vector % Specifies the initial value for the weight vector W (SGD only). % % Bias:: 0 % Specifies the initial value of the bias term (SGD only). % % Weights:: [] % Specifies a weight vector to assign a different non-negative % weight to each data point. An application is to rebalance % unbalanced datasets. % % FORMULATION % % VL_SVMTRAIN() minimizes the objective function of the form: % % LAMBDA/2 |W|^2 + 1/N SUM_i LOSS(W' X(:,i), Y(i)) % % where LOSS(W' Xi,Yi) is the loss (hinge by default) for i-th % data point. The bias is incorporated by extending each data % point X with a feature of constant value B0, such that the % objective becomes % % LAMBDA/2 (|W|^2 + WB^2) 1/N SUM_i LOSS(W' X(:,i) + WB B0, Y(i)) % % Note that this causes the learned bias B = WB B0 to shrink % towards the origin. % % Example:: % Learn a linear SVM from data X and labels Y using 0.1 % as regularization coefficient: % % [w, b] = vl_svmtrain(x, y, 0.1) ; % % The SVM can be evaluated on new data XTEST with: % % scores = w'*xtest + b ; % % Alternatively, VL_SVMTRAIN() can be used for evaluation too: % % [~,~,~, scores] = vl_svmtrain(xtest, y, 0, 'model', w, 'bias', b, 'solver', 'none') ; % % The latter form is particularly useful when X is a DATASET structure. % % See also: SVM fundamentals, % VL_SVMDATASET(), VL_HELP(). % AUTHORIGHTS