hog.h implements the Histogram of Oriented Gradients (HOG) features in the variants of Dalal Triggs dalal05histograms and of UOCTTI felzenszwalb09object . Applications include object detection and deformable object detection.
Overview
HOG is a standard image feature used, among others, in object detection and deformable object detection. It decomposes the image into square cells of a given size (typically eight pixels), compute a histogram of oriented gradient in each cell (similar to Scale Invariant Feature Transform (SIFT)), and then renormalizes the cells by looking into adjacent blocks.
VLFeat implements two HOG variants: the original one of Dalal-Triggs dalal05histograms and the one proposed in Felzenszwalb et al. felzenszwalb09object .
In order to use HOG, start by creating a new HOG object, set the desired parameters, pass a (color or grayscale) image, and read off the results.
HOG is a feature array of the dimension returned by vl_hog_get_width, vl_hog_get_height, with each feature (histogram) having dimension vl_hog_get_dimension. The array is stored in row major order, with the slowest varying dimension beying the dimension indexing the histogram elements.
The number of entreis in the histogram as well as their meaning depends on the HOG variant and is detailed later. However, it is usually unnecessary to know such details. hog.h provides support for creating an inconic representation of a HOG feature array:
It is often convenient to mirror HOG features from left to right. This can be obtained by mirroring an array of HOG cells, but the content of each cell must also be rearranged. This can be done by the permutation obtaiend by vl_hog_get_permutation.
Furthermore, hog.h suppots computing HOG features not from images but from vector fields ::vl_
Technical details
HOG divdes the input image into square cells of size cellSize
, fitting as many cells as possible, filling the image domain from the upper-left corner down to the right one. For each row and column, the last cell is at least half contained in the image. More precisely, the number of cells obtained in this manner is:
Then the image gradient \( \nabla \ell(x,y) \) is computed by using central difference (for colour image the channel with the largest gradient at that pixel is used). The gradient \( \nabla \ell(x,y) \) is assigned to one of 2*numOrientations
orientation in the range \( [0,2\pi) \) (see Conventions for details). Contributions are then accumulated by using bilinear interpolation to four neigbhour cells, as in Scale Invariant Feature Transform (SIFT). This results in an histogram \(h_d\) of dimension 2*numOrientations, called of directed orientations since it accounts for the direction as well as the orientation of the gradient. A second histogram \(h_u\) of undirected orientations of half the size is obtained by folding \( h_d \) into two.
Let a block of cell be a \( 2\times 2 \) sub-array of cells. Let the norm of a block be the \( l^2 \) norm of the stacking of the respective unoriented histogram. Given a HOG cell, four normalisation factors are then obtained as the inverse of the norm of the four blocks that contain the cell.
For the Dalal-Triggs variant, each histogram \( h_d \) is copied four times, normalised using the four different normalisation factors, the four vectors are stacked, saturated at 0.2, and finally stored as the descriptor of the cell. This results in a numOrientations
* 4 dimensional cell descriptor. Blocks are visited from left to right and top to bottom when forming the final descriptor.
For the UOCCTI descriptor, the same is done for both the undirected as well as the directed orientation histograms. This would yield a dimension of 4*
(2+1)*numOrientations elements, but the resulting vector is projected down to (2+1)*numOrientations elements by averaging corresponding histogram dimensions. This was shown to be an algebraic approximation of PCA for descriptors computed on natural images.
In addition, for the UOCTTI variant the l1 norm of each of the four l2 normalised undirected histograms is computed and stored as additional four dimensions, for a total of 4+3*numOrientations
dimensions.
Conventions
The orientation of a gradient is expressed as the angle it forms with the horizontal axis of the image. Angles are measured clock-wise (as the vertical image axis points downards), and the null angle corresponds to an horizontal vector pointing right. The quantized directed orientations are \( \mathrm{k} \pi / \mathrm{numOrientations} \), where k
is an index that varies in the ingeger range \( \{0, \dots, 2\mathrm{numOrientations} - 1\} \).
Note that the orientations capture the orientation of the gradeint; image edges would be oriented at 90 degrees from these.