VLFeat includes a basic implementation of k-means clustering and hierarchical k-means clustering. They are designed to be lightweight in order to work on large datasets. In particular, they assume that the data are vectors of unsigned chars (one byte). While this is limiting for some application, it works well for clustering image descriptors, where very high precision is usually unnecessary. For more details, see the Integer k-means API reference.
Usage
Integer k-means (IKM) is run by the command
vl_ikmeans. In order to demonstrate the usage of this
command, we sample 1000 random points in the
[0,255]^2
integer square and use
vl_ikmeans to get k=3
clusters:
K = 3 ;
data = uint8(rand(2,1000) * 255) ;
[C,A] = vl_ikmeans(data,K) ;
The program returns both the cluster centers C and the
data-to-cluster assignments
A. By means of the cluster
centers
C we can project more data on the same clusters
datat = uint8(rand(2,10000) * 255) ;
AT = vl_ikmeanspush(datat,C) ;
In order to visualize the results, we associate to each cluster a color and we plot the points:
cl = get(gca,'ColorOrder') ;
ncl = size(cl,1) ;
for k=1:K
sel = find(A == k) ;
selt = find(AT == k) ;
plot(data(1,sel), data(2,sel), '.',...
'Color',cl(mod(k,ncl)+1,:)) ;
plot(datat(1,selt),datat(2,selt),'+',...
'Color',cl(mod(k,ncl)+1,:)) ;
end
Elkan
VLFeat supports two different implementations of k-means. While
they produce identical output, the Elkan method requires fewer
distance computations. The method parameters controls
which method is used. Consider the case when
K=100 and our
data is now 128 dimensional (e.g. SIFT descriptors):
K=100;
data = uint8(rand(128,10000) * 255);
tic;
[C,A] = vl_ikmeans(data,K,'method', 'lloyd') ; % default
t_lloyd = toc
tic;
[C,A] = vl_ikmeans(data,K,'method', 'elkan') ;
t_elkan = toc
t_lloyd =
10.2884
t_elkan =
5.1405