Table of Contents
scalespace.h implements a Gaussian scale space, a data structure representing an image at multiple resolutions [27] [10] [11] . Scale spaces are used, for example, to detect scale invariant features [12] such as SIFT, Hessian-Affine, Harris-Affine, Harris-Laplace, etc.
Overview
A scale space is representation of an image at multiple resolution levels. An image is a function \(\ell(x,y)\) of two coordinates \(x\), \(y\); the scale space \(\ell(x,y,\sigma)\) adds a third coordinate \(\sigma\) indexing the scale. scalespace.h implements in particular the Gaussian scale space, where the image \(\ell(x,y,\sigma)\) is obtained by smoothing \(\ell(x,y)\) by a Gaussian kernel of isotropic standard deviation \(\sigma\).
In practice, the spatial coordiantes \((x,y)\) and the scale coordinate \(\sigma\) are sampled. The density of the spatial sampling is adjusted as a function of the scale: intuitively, coarser resolution images can be sampled more coarsely. Thus, the scale space has the structure of a pyramid: a collection of images of progressively coarser resolution and smaller size (in pixels).
The pyramid is organised in a number of octaves, indexed by a parameter o
, and further sublevels for each octave, indexed by a parameter s
. These are related to the scale \(\sigma\) by
\[ \sigma(s,o) = \sigma_o 2^{\displaystyle o + \frac{s}{\mathtt{octaveResolution}}} \]
where octaveResolution
is the resolution of the octave subsampling \(\sigma_0\) is the base smoothing.
Given an input image image
, the following example uses the VlScaleSpace object to compute the image level
, which is the version of image
at scales (o,s)
:
Note that level
has not necessarily the same dimensions or sampling density of image
. The geometry of level
can be obtained as
This means that \(\ell(x,y,\sigma) = \mathtt{level}[i,j]\), where
\[ x = \mathtt{ogeom.step} \times i, \quad 0 \leq i < \mathtt{ogeom.width}, \]
and similarly for \(y\). The other impotant information in order to use scale space is to know the range of the paramerters o
and s
. This can be obtained as
So for example o
varies in the range from geom.firstOctave
to geom.lastOctave
and s
in the range geom.octaveFirstSubdivision
to geom.octaveLastSubdivision
.
Finer control
VlScaleSpace offers more control on how the pyramid is constructed. Two usage cases are the following:
- In feature detection it is often convenient to sample a few redudant sublevels per octave, effectively representing the same scale with two different spatial samplings. This simplifies detecting extrema in scales at boundary scale levels.
- Often there is a convenience in oversampling (at more than the Nyquist frequency) the finer scales. While oversampling does not add information to the image, it may simplify significantly the implementation of filers such as derivatives.
All this is possible by using a more advanced interface to VlScaleSpace. For example, in oerder to start sampling at octave -1 and have two redundant subdivisions per octave, one can use:
Algorithm and limitations
Formally, the Gaussian scale space of an image \(\ell(x,y)\) is defined as
\[ \ell(x,y,\sigma) = [g_{\sigma} * \ell](x,y,\sigma) \]
where \(g_\sigma\) denotes a 2D Gaussian kernel of isotropic standard deviation \(\sigma\):
\[ g_{\sigma}(x,y) = \frac{1}{2\pi\sigma^2} \exp\left( - \frac{x^2 + y^2}{2\sigma^2} \right). \]
An important detail is that the algorithm is not fed with the dieal infinite resolution image \(\ell(x,y)\), but with a pre-smoothed and sampled version measured by the CCD. This is modelled by assuming that the input is in fact the sampled image \(\ell(x,y,\sigma_n)\), where \(\sigma_n\) is a nominal smoothing, usually taken to be 0.5 (half a pixel standard deviation). This also means that \(\sigma = \sigma_n = 0.5\) is the finest scale that can actually be computed.
Given \(\ell(x,y,\sigma_n)\), any of a vast number digitial filtering techniques can be used to compute the succesive scales. Presently, VlScaleSpace uses a basic FIR implementation of the Gaussian filters.
The FIR implementation is obtained by sampling the Gaussian function and re-normalizing it to have unit norm. As a rule of thumb, such a filter works sufficiently well for, say, feature detection if the standard deviation \(\sigma\) is at least 1.6 times the spatial sampling step. The same filter works better still the input image is oversampled.
The limitations on the FIR filters have impliciations on the pyramid itself, as the latter is constructed by incremental smoothing: each successive level is obtained from the previous one by adding the needed amount of smoothing. In this manner, the size of the FIR filters remains small; however, if the scale sublevels are too packed, the incremental smoothing is so small that the FIR implementation may break.