티스토리 뷰
[CS5670] Lecture 6: Feature Descriptors and Feature Matching
미남잉 2022. 6. 23. 15:57내용 출처: CS5670
이번 파트는 cs5670의 6번째 강의로 Feature Descriptors and Feature Matching의 주제입니다. 책으로는 4.1 파트에 해당됩니다.
Local features를 찾는데 1) Detection, 2) Description, 3) Matching이란 3단계가 있다면, 이제 2번째 단계입니다. 해당 파트는 PPT 자료에 나온 그대로 각각 흥미 있는 포인트 주위로 벡터의 특징을 추출하는 부분입니다.
Feature descriptors
저번 Lec5. 에서 해당 페이지를 마지막으로 끝이 났습니다.
We know how to detect good points
Next question: How to match them?
Answer: Come up with a descriptor for each point, find similar descriptors between the two images
각 지점에 대한 descriptor를 찾아내고, 두 이미지 사이의 비슷한 descriptors를 찾습니다.
이제 descriptors가 무엇인지 궁금하실 거라 생각합니다. descripot은 특정 object를 대표할 수 있는 특성이라 생각할 수 있습니다.
이 개념은 SIFT(Scale Invaraint Feature Transform) 알고리즘에 대해 설명할 때 보충하겠습니다.
Lots of possibilities
- Simple option: match square windows around the point
- State of the art approach: SIFT
Invariance vs. discriminability (중요✔)
- Invariance: Descriptor shouldn’t change even if image is transformed
- Discriminability: Descriptor should be highly unique for each point
descriptor는 이미지의 변형에 따라 변하지 않아야 하고, 그 특성을 invariance 라고 합니다. 또한, descriptor가 각 포인트에 따라 매우 특별해야 하는 특성을 Discriminability라고 합니다.
Image transformation revisited
- Geometric: Rotation, Scale
- Photo metric: Intensity change
Invariant descriptors
- We looked at invariant /equivariant detectors
- Most feature descriptors are also designed to be invaraint to:
- Translation, 2D rotation, scale
- They can usually also handle
- Limited 3D rotations (SIFT works up to about 60 degrees)
- Limited affine transforms (some are fully affine invariant)
- Limited illumination /contrast changes
How to achieve invariance
Need both of the following:
- Make sure your detector is invaraint
- Design an invariant feature descriptor
- Simplest descriptor: a single 0
- What’s this invaraint to?
- Next simplest descriptor: a square, axis-aligned 5x5 window of pixels
- What’s this invarint to?
- Let’s look at some better approaches…
- Simplest descriptor: a single 0
Rotation invariance for feature descriptors
- Find dominant orientation of the image patch
- E.g., given by $x_{mean}$ the eigenvector of H corresponding to $\lambda_{max}$ (the larger eigenvalue)
- Or (better) simply the orientation of the (smoothed) gradient
- Rotate the patch according to this angle
Multiscale Oriented PatcheS descriptor
Take 40x40 square window around detected feature
- Scale to 1/5 size (using prefiltering)
- Rotate to horizontal
- Sample 8x8 square window centered at feature
- Intensity normalize the window by subtracting the mean, dividing by the standard deviation in the window (why?)
Detections at multiple scales
Scale Invariant Feature Transform
Basic idea:
• Take 16x16 square window around detected feature
• Compute edge orientation (angle of the gradient - 90°) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations
SIFT descriptor
Full version • Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below) • Compute an orientation histogram for each cell • 16 cells * 8 orientations = 128 dimensional descriptor
Properties of SIFT
Extraordinarily robust matching technique
- Can handle changes in viewpoint (up to about 60 degree out of plane rotation)
- Can handle significant changes in illumination (sometimes even day vs. night (below))
- Pretty fast—hard to make real-time, but can run in <1s for moderate image sizes
- Lots of code available
SIFT Example
Other descriptors
- HOG: Histogram of Gradient (HOG)
- Dalal/Triggs
- Sliding window, pedestrain detection
- FREAK: Fast Retina Keypoint
- Perceptually motivated
- Can run in real-time; used in Visual SLAM on-device
- LIFT: Learn invariant Feature Transform
- Learned via deep learning - along with many other recent features
Summary
- keypoint detection: repeatable and distinctive
- Corners, blobs, stable regions
- Harris, DoG
- Descriptors: robust and selective
- spatial historgrams of orientation
- SIFT and variants are typically good for stiching and recognition
- But, need not stick to one
Which features match?
Feature matching
Given a feature in $I_1$, how to find the best match in $I_2$?
- Define distance function that compares two descriptors
- Test all the features in $I_2$, find the one with min distance
Feature distance
How to define the difference between two features $f_1$, $f_2$?
- Simple approach: $L_2$ distance, ||$f_1$-$f_2$||
- can give small distances for ambiguous (incorrect) matches
How to define the difference between two features $f_1$, $f_2$?
- Better approach: ratio distance = ||$f_1-f_2$|| / || $f_1-f_2$’||
- $f_2$ is the best SSD match to $f_1$ in $I_2$
- $f_2$’ is the 2nd best SSD match to $f_1$ in $I_2$
- gives large values for ambiguous matches
Does the SSD vs “ratio distance” change the best match to a given feature in image 1?
Feature matching example
Evaluating the results
How can we measure the performance of a feature matcher?
True/false positives
The distance threshold affects performance
- True positives(TP) = # of detected matches that survive the threshold that are correct
- False positives(FP) = # of detected matches that survive the threshold that are incorrect
- Suppose we want to maximize true positives. How do we set the threshold? (We keep all matches with distance below the threshold.)
- Suppose we want to minimize false positives. How do we set the threshold? (We keep all matches with distance below the threshold.)
Evaluate the results
How can we measure the performance of a feature matcher?
- Single number: Area Under the Curve (AUC)
- E.g. AUC = 0.87
- 1 is the best
ROC curves - summary
- By thresholding the match distances at different thresholds, we can generate sets of matches with different true/false positive rates
- ROC curve is generated by computing rates at a set of threshold values swept through the full range of possible threshold
- Area under the ROC curve (AUC) summarizes the performance of a feature pipeline (higher AUC is better)
Lots of applications
Feature are used for:
- Image alignment (e.g., mosaics)
- 3D reconstruction
- Motion tracking
- Object recognition
- Indexing and database retrieval
- Robot navigation
- … other
Object recognition (David Lowe)
3D Reconstruction
Augmented Reality
'AI > Computer Vision' 카테고리의 다른 글
[보충] Affine transformations and Homography (0) | 2022.07.14 |
---|---|
[CS5670] Lecture 7: Transformations and warping (0) | 2022.06.24 |
[CS5670] Lecture 5: Feature Invariance (0) | 2022.06.23 |
Aliasing(엘리어싱) - 발생 이유, 결과, 방지 방법 (0) | 2022.06.01 |
[CS5670] Lecture 4: Local features & Harris corner detection (0) | 2022.05.23 |
- Total
- Today
- Yesterday
- NLP
- docker
- 구글드라이브서버연동
- 파이썬 클래스 다형성
- python
- style transfer
- Unsupervised learning
- 딥러닝
- 구글드라이브연동
- CNN
- cs231n
- 구글드라이브다운
- Prompt
- 파이썬
- prompt learning
- vscode 자동 저장
- 파이썬 클래스 계층 구조
- 서버구글드라이브연동
- clip
- 데이터셋다운로드
- 퓨샷러닝
- 도커 컨테이너
- support set
- 도커
- stylegan
- 파이썬 딕셔너리
- 서버에다운
- 구글드라이브서버다운
- 프롬프트
- few-shot learning
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |