Vision Lab Features Library (VLFeat)

Author:: Karel Lenc; Andrea Vedaldi; Michal Perdoch

scalespace.h implements a Gaussian scale space, a data structure representing an image at multiple resolutions [27] [10] [11] . Scale spaces are used, for example, to detect scale invariant features [12] such as SIFT, Hessian-Affine, Harris-Affine, Harris-Laplace, etc.

Overview

A scale space is representation of an image at multiple resolution levels. An image is a function \(\ell(x,y)\) of two coordinates \(x\), \(y\); the scale space \(\ell(x,y,\sigma)\) adds a third coordinate \(\sigma\) indexing the scale. scalespace.h implements in particular the Gaussian scale space, where the image \(\ell(x,y,\sigma)\) is obtained by smoothing \(\ell(x,y)\) by a Gaussian kernel of isotropic standard deviation \(\sigma\).

In practice, the spatial coordiantes \((x,y)\) and the scale coordinate \(\sigma\) are sampled. The density of the spatial sampling is adjusted as a function of the scale: intuitively, coarser resolution images can be sampled more coarsely. Thus, the scale space has the structure of a pyramid: a collection of images of progressively coarser resolution and smaller size (in pixels).

The pyramid is organised in a number of octaves, indexed by a parameter o, and further sublevels for each octave, indexed by a parameter s. These are related to the scale \(\sigma\) by

\[ \sigma(s,o) = \sigma_o 2^{\displaystyle o + \frac{s}{\mathtt{octaveResolution}}} \]

where octaveResolution is the resolution of the octave subsampling \(\sigma_0\) is the base smoothing.

Given an input image image, the following example uses the VlScaleSpace object to compute the image level, which is the version of image at scales (o,s):

float* level ;
VlScaleSpace ss = vl_scalespace_new(imageWidth, imageHeight) ;
vl_scalespace_put_image(ss, image) ;
level = vl_scalespace_get_level(ss, o, s) ;

Note that level has not necessarily the same dimensions or sampling density of image. The geometry of level can be obtained as

VlScaleSpaceOctaveGeometry ogeom = vl_scalespace_get_octave_geometry(ss, o) ;
ogeom.width \\ width of level (in number of pixels)
ogeom.height \\ height of level (in number of pixels)
ogeom.step \\ spatial sampling step

This means that \(\ell(x,y,\sigma) = \mathtt{level}[i,j]\), where

\[ x = \mathtt{ogeom.step} \times i, \quad 0 \leq i < \mathtt{ogeom.width}, \]

and similarly for \(y\). The other impotant information in order to use scale space is to know the range of the paramerters o and s. This can be obtained as

VlScaleSpaceGeometry geom = vl_scalespace_get_geomerty(ss) ;

So for example o varies in the range from geom.firstOctave to geom.lastOctave and s in the range geom.octaveFirstSubdivision to geom.octaveLastSubdivision.

Finer control

VlScaleSpace offers more control on how the pyramid is constructed. Two usage cases are the following:

In feature detection it is often convenient to sample a few redudant sublevels per octave, effectively representing the same scale with two different spatial samplings. This simplifies detecting extrema in scales at boundary scale levels.

Often there is a convenience in oversampling (at more than the Nyquist frequency) the finer scales. While oversampling does not add information to the image, it may simplify significantly the implementation of filers such as derivatives.

All this is possible by using a more advanced interface to VlScaleSpace. For example, in oerder to start sampling at octave -1 and have two redundant subdivisions per octave, one can use:

VlScaleSpaceGeometry geom = vl_scalespace_get_default_geometry(imageWidth, imageHeight) ;
geom.firstOctave = -1 ;
geom.octaveFirstSubdivision = -1 ;
geom.octaveLastSubdivision = geom.octaveResolution ;
VlScaleSpacae ss = vl_scalespace_new_with_geometry (geom) ;

Algorithm and limitations

Formally, the Gaussian scale space of an image \(\ell(x,y)\) is defined as

\[ \ell(x,y,\sigma) = [g_{\sigma} * \ell](x,y,\sigma) \]

where \(g_\sigma\) denotes a 2D Gaussian kernel of isotropic standard deviation \(\sigma\):

\[ g_{\sigma}(x,y) = \frac{1}{2\pi\sigma^2} \exp\left( - \frac{x^2 + y^2}{2\sigma^2} \right). \]

An important detail is that the algorithm is not fed with the dieal infinite resolution image \(\ell(x,y)\), but with a pre-smoothed and sampled version measured by the CCD. This is modelled by assuming that the input is in fact the sampled image \(\ell(x,y,\sigma_n)\), where \(\sigma_n\) is a nominal smoothing, usually taken to be 0.5 (half a pixel standard deviation). This also means that \(\sigma = \sigma_n = 0.5\) is the finest scale that can actually be computed.

Given \(\ell(x,y,\sigma_n)\), any of a vast number digitial filtering techniques can be used to compute the succesive scales. Presently, VlScaleSpace uses a basic FIR implementation of the Gaussian filters.

The FIR implementation is obtained by sampling the Gaussian function and re-normalizing it to have unit norm. As a rule of thumb, such a filter works sufficiently well for, say, feature detection if the standard deviation \(\sigma\) is at least 1.6 times the spatial sampling step. The same filter works better still the input image is oversampled.

The limitations on the FIR filters have impliciations on the pyramid itself, as the latter is constructed by incremental smoothing: each successive level is obtained from the previous one by adding the needed amount of smoothing. In this manner, the size of the FIR filters remains small; however, if the scale sublevels are too packed, the incremental smoothing is so small that the FIR implementation may break.

VLFeat.org

Table of Contents

Overview

Finer control

Algorithm and limitations