[CS5670] Lecture 4: Local features & Harris corner detection

    728x90

    | Today's topic is Features extraction - Corners and blobs.

     

     

    | Key points

    • feature extraction
    • local feature
    • Harris corner detection

     

    Motivation: Automatic panoramas

     

    The human visual system has a field of view of around 135 x 200 degrees, but a typical camera has a field of view of only 35 x 50 degrees. Panoramic image mosaicing works by taking lots of pictures from an ordinary camera, and stitching them together to form a composite image with a much larger field of view.

     

    reference: AutoStitch

     

     

    Process

     

    25 of 57 images aligned
    All 57 images aligned
    Final Result

     

    Why extract features?

    • Motivation: panorama stitching
      • We have two images how do we combine them?

     

     

    • Step 1: extract features
    • Step 2: match features

     

    • Step 1: extract features
    • Step 2: match features
    • Step 3: align images

     

     

     

    Application: Visual SLAM

     

    Simultaneous Localization and Mapping

     

     

    Image matching

     

     

    Harder case

     

     

    Maybe it's because these two photos are from different angles.

     

    Harder still?

     

     

     

     

    Answer below (look for tiny colored squares…)

     

     

    NASA Mars Rover images with SIFT feature matches.

     

    Feature matching for object search

     

     

    Feature matching

     

     

    Invariant local features

     

    • Find features that are invariant to transformations
      • geometric invariance: translation, rotation, scale
      • photometric invariance: brightness, exposure, …

     

    Feature Descriptors

     

     

    Advantages of local features

     

    Locality

    • features are local, so robust to occlusion and clutter

    Quantity

    • hundreds or thousands in a single image

    Distinctiveness

    • can differentiate a large database of objects

    Efficiency

    • real time performance achievable

     

    More motivation…

     

     

    Feature points are used for:

    • Image alignment (e.g., mosaics)
    • 3D reconstruction
    • Motion tracking (e.g. for AR)
    • Object recognition
    • Image retrieval
    • Robot/car navigation
    • –… other

     

    Approach

     

    1. Feature detection: find it
    2. Feature descriptor: represent it
    3. Feature matching: match it


    Feature tracking: track it, when motion

     

    Local features: main components

     

     

    1) Detection: Identify the interest points

     

     

    2) Descroption: Extract vector feature descriptor surrounding each interest point

     

     

    3) Matching: Determine correspondence between descriptors in two views

     

     

    What makes a good feature?

     

    Want uniqueness

     

    Look for image regions that are unusual
    – Lead to unambiguous matches in other images


    How to define “unusual”?

     

    Local measures of uniqueness

     

    • Suppose we only consider a small window of pixels
      – What defines whether a feature is a good or bad candidate?

     

     

    • How does the window change when you shift it?
    • Shifting the window in any direction causes a big change

     

     

    Harris corner detection: the math

    Consider shifting the window $W$ by ($u,v$)

     

    • how do the pixels in $W$ change?
    • compare each pixel before and after by summing up the squared differences (SSD)
    • this defines an SSD “error” $E(u,v)$

     

     

    $$E(u,v) = \Sigma_{(x,y) \in W} [I(x+u, y+v) - I(x,y)]^2$$

     

    • We are happy if this error is high
    • Slow to compute exactly for each pixel and each offset $(u,v)$

     

    Small motion assumption

     

    Talor series expansion of $l$:

     

    $I(x+u, y+v) = I(x,y) + \frac{\partial I}{\partial x} u + \frac{\partial I}{\partial y} v$ + higher order terms

     

    If the motion $(u,v)$ is small, then first order approximation is good

     

    $$I(x+u, y+v) \approx I(x,y) +\frac{\partial I}{\partial x} u + \frac{\partial I}{\partial y} v$$

     

    $$ \approx I(x,y) + [I_x, I_y] \begin{bmatrix}0 \\  0 \end{bmatrix} $$

     

    Shorthand: $I_x = \frac{\partial I}{\partial x}$

     

    Plugging this into the formula on the previous slide…

     

    Corner detection: the math

     

    Consider shifting the window $W$ by $(u,v)$

    • defines an SSD “error” $E(u,v)$

     

    $$E(u,v) = \Sigma [I(x+u, y+v) - I(x,y)]^2$$

     

    $$ \approx \Sigma_{(x,y) \in W} [I(x,y) + I_x u + I_y v - I(x,y)]^2$$

     

    $$\approx \Sigma_{(x,y) \in W} [I_xu + I_yv]^2$$

     

    $$\approx Au^2 + 2Buv + Cv^2$$

     

     

    Thus, $E(u,v)$ is locally approximated as a quadratic error function.

     

    The second moment matrix

     

    The surface $E(u,v)$ is locally approximated by a quadratic form.

     

    $$E(u,v) \approx Au^2 + 2Buv + Cv^2$$

     

    $$\approx \begin{bmatrix} u & v \end{bmatrix} \begin{bmatrix} A & B \\ B & C \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} $$

     

    $$H = \begin{bmatrix} A & B \\ B & C \end{bmatrix}$$

     

     

     

     

     

     

     

    General case

     

    We can visualize $H$ as an ellipse with axis lengths determined by the eigenvalues of $H$ and orientation
    determined by the eigenvectors of $H$

     

    Ellipse equation:

    $\approx \begin{bmatrix} u & v \end{bmatrix} H \begin{bmatrix} u \\ v \end{bmatrix} = const$

     

     

    Quick eigenvalue/eigenvector review

     

    The eigenvectors of a matrix $A$ are the vectors $x$ that satisfy:

     

    $$Ax = \lambda x$$

     

    The scalar $\lambda$ is the eigenvalue corresponding to x

    – The eigenvalues are found by solving:

     

    $$det(A-\lambda I) = 0$$

     

    – In our case, $A = H$ is a 2x2 matrix, so we have

     

     

    – The solution:

     

    $$\lambda_{\pm} = \frac{1}{2} [(h_{11}+h_{22}) \pm \sqrt{4h_{12} h_{21} + (h_{11} - h_{22})^2}]$$

     

    Once you know $\lambda$, you find $x$ by solving

     

    $$\begin{bmatrix} h_{11}-\lambda & h_{12} \\ h_{21} & h_{22}-\lambda \end{bmatrix} \begin{bmatrix} x\\ y \end{bmatrix}$$

     

     

    Corner detection: the math

     

     

    $$E(u,v) \approx \begin{bmatrix} u & v \end{bmatrix} \begin{bmatrix} A & B \\ B & C \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix}$$

     

    • $Hx_{max} = \lambda_{max}x_{max}$
    • $Hx_{min} = \lambda_{min}x_{min}$

     

    Eigenvalues and eigenvectors of $H$

    • Define shift directions with the smallest and largest change in error
    • $x_{max}$ = direction of largest increase in $E$
    • $\lambda_{max}$ = amount of increase in direction $x_{max}$
    • $x_{min}$ = direction of smallest increase in $E$
    • $\lambda_{min}$ = amount of increase in direction $x_{min}$

     

    How are $\lambda_{max}$, $x_{max}$, $\lambda_{min}$ and $x_{min}$

    • What’s our feature scoring function?

    Want $E(u,v)$ to be large for small shifts in all directions.

    • the minimum of $E(u,v)$ should be large, over all unit vectors $[u v]$
    • this minimum is given by the smaller eigenvalue $\lambda_{min}$ of $H$

     

     

    Interpreting the eigenvalues

     

    Classification of image points using eigenvalues of $M$

     

     

     

    Corner detection summary

     

     

    Here’s what you do:

    • Compute the gradient at each point in the image
    • For each pixel:
      • Create the $H$ matrix from nearby gradient values
      • Compute the eigenvalues.
      • Find points with large response ($\lambda_{min}$ > threshold)
    • Choose those points where $\lambda_{min}$ is a local maximum as features

     

    Find points with large response ($\lambda_{min}$ > threshold)

     

     

    The Harris operator

     

    $\lambda_{min}$ is a variant of the "Harris operator" for feature detection

     

    $$f = \frac{\lambda_1 \lambda_2}{\lambda_1 \lambda_2}$$

     

    $$=\frac{determinant(H)}{trace(H)}$$

     

    • The trace is the sum of the diagonals, i.e., $trace(H) = h_{11}+ h_{22}$
    • Very similar to $\lambda_{min}$ but less expensive (no square root)
    • Called the Harris Corner Detector or Harris Operator
    • Lots of other detectors, this is one of the most popular

     

    Alternate Harris operator

     

    For Project 2, you will use an alternate definition of the Harris operator:

     

    $$R = \lambda_1  \lambda_2 - k \cdot ( \lambda_1 +  \lambda_2)^2 = det(M) - k \cdot tr(M)^2$$

     

     

    The Harris operator

     

     

    Harris detector example

     

     

    f value (red high, blue low)

     

     

    Threshold (f > value)

     

     

    Find local maxima of f (non-max suppression)

     

     

    Harris features (in red)

     

     

    Weighting the derivatives

     

    In practice, using a simple window $W$ doesn’t work too well

     

     

    Instead, we’ll weight each derivative value based on its distance from the center pixel

     

     

    Harris Detector

    Second moment matrix

     

    1. Image derivatives

     

     

    $$\mu(\sigma_I,\sigma_D = g(\sigma_I) \divideontimes  \begin{bmatrix} I^2_x(\sigma_D) & I_x I_y(\sigma_D) \\ I_x I_y(\sigma_D) & I^2_y(\sigma_D) \end{bmatrix}$$

     

     

    2. Square of derivatives

     

     

    $$detM = \lambda_1 \lambda_2$$

     

    $$trace M = \lambda_1 + \lambda_2$$

     

     

    3. Gaussian filter $g(s_I)$

     

     

    4. Cornerness function – both eigenvalues are strong

     

     

    5. Non-maxima suppression

     

    Harris Corners – Why so complicated?

     

     

    Can’t we just check for regions with lots of gradients in the $x$ and $y$ directions?
    – No! A diagonal line would satisfy that criteria

     

     

     

    Reference: Lecture 4: Local features & Harris corner detection

    728x90

    댓글