Documentation - C API
ikmeans.h File Reference

Integer K-Means clustering. More...

#include "generic.h"
#include "random.h"

Data Structures

struct  VlIKMFilt
 IKM quantizer. More...

Typedefs

typedef vl_int32 vl_ikm_acc

Enumerations

enum  VlIKMAlgorithms { VL_IKM_LLOYD, VL_IKM_ELKAN }
 IKM algorithms. More...

Functions

Create and destroy
VlIKMFiltvl_ikm_new (int method)
 Create a new IKM quantizer.
void vl_ikm_delete (VlIKMFilt *f)
 Delete IKM quantizer.
Process data
void vl_ikm_init (VlIKMFilt *f, vl_ikm_acc const *centers, int M, int K)
 Initialize quantizer with centers.
void vl_ikm_init_rand (VlIKMFilt *f, int M, int K)
 Initialize quantizer with random centers.
void vl_ikm_init_rand_data (VlIKMFilt *f, vl_uint8 const *data, int M, int N, int K)
 Initialize with centers from random data.
int vl_ikm_train (VlIKMFilt *f, vl_uint8 const *data, int N)
 Train clusters.
void vl_ikm_push (VlIKMFilt *f, vl_uint *asgn, vl_uint8 const *data, int N)
 Project data to clusters.
vl_uint vl_ikm_push_one (vl_ikm_acc const *centers, vl_uint8 const *data, int M, int K)
 Project one datum to clusters.
Retrieve data and parameters
int vl_ikm_get_ndims (VlIKMFilt const *f)
 Get data dimensionality.
int vl_ikm_get_K (VlIKMFilt const *f)
 Get the number of centers K.
int vl_ikm_get_verbosity (VlIKMFilt const *f)
 Get verbosity level.
int vl_ikm_get_max_niters (VlIKMFilt const *f)
 Get maximum number of iterations.
vl_ikm_acc const * vl_ikm_get_centers (VlIKMFilt const *f)
 Get maximum number of iterations.
Set parameters
void vl_ikm_set_verbosity (VlIKMFilt *f, int verb)
 Set verbosity level.
void vl_ikm_set_max_niters (VlIKMFilt *f, int max_niters)
 Set maximum number of iterations.

Detailed Description

Integer K-means (IKM) is an implementation of K-means clustering (or Vector Quantization, VQ) for integer data. This is particularly useful for clustering large collections of visual descriptors.

Use the function vl_ikm_new() to create a IKM quantizer. Initialize the IKM quantizer with K clusters by vl_ikm_init() or similar function. Use vl_ikm_train() to train the quantizer. Use vl_ikm_push() or vl_ikm_push_one() to quantize new data.

Given data \(x_1,\dots,x_N\in R^d\) and a number of clusters \(K\), the goal is to find assignments \(a_i\in\{1,\dots,K\},\) and centers \(c_1,\dots,c_K\in R^d\) so that the expected distortion

\[ E(\{a_{i}, c_j\}) = \frac{1}{N} \sum_{i=1}^N d(x_i, c_{a_i}) \]

is minimized. Here \(d(x_i, c_{a_i})\) is the distortion, i.e. the cost we pay for representing \( x_i \) by \( c_{a_i} \). IKM uses the squared distortion \(d(x,y)=\|x-y\|^2_2\).

Algorithms

Initialization

Most K-means algorithms are iterative and needs an initialization in the form of an initial choice of the centers \(c_1,\dots,c_K\). We include the following options:

Lloyd

The Lloyd (also known as Lloyd-Max and LBG) algorithm iteratively:

  • Fixes the centers, optimizing the assignments (minimizing by exhaustive search the association of each data point to the centers);
  • Fixes the assignments and optimizes the centers (by descending the distortion error function). For the squared distortion, this step is in closed form.

This algorithm is not particularly efficient because all data points need to be compared to all centers, for a complexity \(O(dNKT)\), where T is the total number of iterations.

Elkan

The Elkan algorithm is an optimized variant of Lloyd. By making use of the triangle inequality, many comparisons of data points and centers are avoided, especially at later iterations. Usually 4-5 times less comparisons than Lloyd are preformed, providing a dramatic speedup in the execution time.

Author:
Brian Fulkerson
Andrea Vedaldi

Typedef Documentation

IKM accumulator data type


Enumeration Type Documentation

Enumerator:
VL_IKM_LLOYD 

Lloyd algorithm

VL_IKM_ELKAN 

Elkan algorithm


Function Documentation

void vl_ikm_delete ( VlIKMFilt f)
Parameters:
fIKM quantizer.
vl_ikm_acc const * vl_ikm_get_centers ( VlIKMFilt const *  f)
inline
Parameters:
fIKM filter.
Returns:
maximum number of iterations.
int vl_ikm_get_K ( VlIKMFilt const *  f)
inline
Parameters:
fIKM filter.
Returns:
number of centers K.
int vl_ikm_get_max_niters ( VlIKMFilt const *  f)
inline
Parameters:
fIKM filter.
Returns:
maximum number of iterations.
int vl_ikm_get_ndims ( VlIKMFilt const *  f)
inline
Parameters:
fIKM filter.
Returns:
data dimensionality.
int vl_ikm_get_verbosity ( VlIKMFilt const *  f)
inline
Parameters:
fIKM filter.
Returns:
verbosity level.
void vl_ikm_init ( VlIKMFilt f,
vl_ikm_acc const *  centers,
int  M,
int  K 
)
Parameters:
fIKM quantizer.
centerscenters.
Mdata dimensionality.
Knumber of clusters.
void vl_ikm_init_rand ( VlIKMFilt f,
int  M,
int  K 
)
Parameters:
fIKM quantizer.
Mdata dimensionality.
Knumber of clusters.
void vl_ikm_init_rand_data ( VlIKMFilt f,
vl_uint8 const *  data,
int  M,
int  N,
int  K 
)
Parameters:
fIKM quantizer.
datadata.
Mdata dimensionality.
Nnumber of data.
Knumber of clusters.
VlIKMFilt* vl_ikm_new ( int  method)
Parameters:
methodClustering algorithm.

The function allocates initializes a new IKM quantizer to operate based algorithm method.

method has values in the enumerations VlIKMAlgorithms.

Returns:
new IKM quantizer.
void vl_ikm_push ( VlIKMFilt f,
vl_uint asgn,
vl_uint8 const *  data,
int  N 
)
Parameters:
fIKM quantizer.
asgnAssignments (out).
datadata.
Nnumber of data (N >= 1).

The function projects the data data on the integer K-means clusters specified by the IKM quantizer f. Notice that the quantizer must be initialized.

vl_uint vl_ikm_push_one ( vl_ikm_acc const *  centers,
vl_uint8 const *  data,
int  M,
int  K 
)
Parameters:
centerscenters.
datadatum to project.
Knumber of centers.
Mdimensionality of the datum.

The function projects the specified datum data on the clusters specified by the centers centers.

Returns:
the cluster index.
void vl_ikm_set_max_niters ( VlIKMFilt f,
int  max_niters 
)
inline
Parameters:
fIKM filter.
max_nitersmaximum number of iterations.
void vl_ikm_set_verbosity ( VlIKMFilt f,
int  verb 
)
inline
Parameters:
fIKM filter.
verbverbosity level.
int vl_ikm_train ( VlIKMFilt f,
vl_uint8 const *  data,
int  N 
)
Parameters:
fIKM quantizer.
datadata.
Nnumber of data (N >= 1).
Returns:
-1 if an overflow may have occurred.