Random notes
- OpenCV uses BGR instead of RGB, need to convert if using different libraries
- OpenCV coord system on an image originates at top left corner
Basic Questions
What is OpenCV and why is it used?
- Open-source library for real-time computer vision
- Used for image processing, video processing
- Cross-platform and can be used in Python, C++, Java and MATLAB
- Efficient (can be used for real-time) and extensive function library
- Real world uses
- Security and surveillance
- Industrial automation: visual inspection systems
- Healthcare and medical imaging: anomalies detection in imaging
- Robotics and drones: SLAM
- Augmented reality: detect AR markers
How would you load and display an image
Load
imread()with path to the file, URL or bytestring (see colab)
- Error handling: it will return None if the image could not be found, check for that
- Flags: can read the image in colour, grayscale, with alpha channel, … Changes how the image will be interpreted
Display
imshow()with image as argument to display it
- Has to be followed by
waitKey()to keep the window openwaitKey(0)waits indefinitely for a key pressedwaitKey(>0)waits for that many millisecondswaitKey(1)used in loops, for non-blocking behaviour, window does not freeze and still responds to inputs
- Follow again with
destroyAllWindows()to close it properly
What is image thresholding and how do you use it?
- Simple image segmentation method based on pixel intensity: creates a binary (black and white) image from a grayscale image where all pixels above threshold are white and all pixels below threshold are black
- Pixel intensity: grayscale value of the pixel (0 to 255)
- Used for separating foreground from background, or isolating objects of interest
- Useful when there’s high contrast between object and background
- Simple thresholding
cv2.threshold(img, thresh, maxval, type)- Same threshold value to every pixel in the image
- Useful for even lighting, strong contrast between foreground and background
- Fails under uneven lighting (shadows, highlights)
- Adaptative thresholding
cv2.adaptiveThreshold(src, maxValue, adaptiveMethod, thresholdType, blockSize, C)- Each pixel gets its own threshold based on the local neighborhood.
- Useful for non-uniform lighting, documents with shadows, gradients or smudges
- Can introduce noise in very uniform images
- Choosing appropriate threshold value: Otsu’s method
ret2,th2 = cv.threshold(img, 0, 255, cv.THRESH_BINARY+cv.THRESH_OTSU)- Computes an optimal threshold that minimizes intra-class variance (difference within foreground and background pixels)
How would you detect edges in an image
- Edge detection is finding boundaries or transitions in an image: areas where pixel intensity changes sharply
- OpenCV methods are gradient-based
- Sobel: finds gradient in x or y direction using convolution with small kernels
- Laplacian: finds areas of rapid intensity change, measures second derivative (how fast gradient is changing)
- Canny: multi-stage: applies gaussian blur, uses Sobel to compute gradients, thin out edges to 1-pixel width, thresholds weak and strong edges
- Usually preferred, gives better results and less susceptible to noise
- Based on two thresholds: edges below t1 are discarded, edges between t1 and t2 are weak edges (kept only if connected to strong ones), edges above t2 are strong edges (thresholds on gradient magnitude)
- Steps: convert image to grayscale, reduce noise (Gaussian blur), apply edge detection
- Useful for object detection, image segmentation
What is image blurring and why is it useful?
- Reduce noise and details in an image, convolve image with a low-pass filter kernel
- Gaussian blur: apply a gaussian kernel to the image, nearby pixels are weighted based on a normal distribution (closer pixels contribute more)
- Median blur: replaces each pixel with median value in its neighbourhood
- Especially good for salt-and-pepper noise
- Bilateral filtering: weighted average of the neighbouring pixels but takes into account that close pixels get higher weight, pixels with similar intensity gets higher weight, pixels that are too different in intensity get very little weight -> keeps edges sharp
- Best as preserving edges while smoothing
- Useful as preprocessing for a lot of tasks, smoothes out small details and minor variations that can interfere with larger scale analysis
- How choice of kernel affects the result
- Larger kernel = better noise removal but more loss of detail
- Method and kernel size changes trade off between noise reduction and loss of detail
What are image moments and how are they used?
- Scalar values that summarise the spatial distribution of pixel intensities in the image
- Capture area, centroid, orientation, symmetry of shapes in the image
returns a dictionary of moment valuescv2.moments(binary_image_or_contour)- Spatial moments
mij- Area m00 (for binary images it’s just the number of white pixels)
- values to compute the centroid m10 and m01
- Central moments
muij- Measure shape’s distribution relative to its centroid, they are translation invariant (do not change it object moves in the image)
- Useful to compute orientation, compare shape regardless of location
- Normalised central moments
nuij- Translation and scale invariant
- Useful to compare shape at different sizes
- Spatial moments
- Hu moments (shape descriptors)
- 7 values derived from normalised central moments
- Translation, scale and rotation invariant
cv2.HuMoments(M)- Useful for shape matching and recognition
- If you give
cv2.moments()a binary image, it treats all white pixels (non-zero) as a single shape/region. - If you give it a contour (like one item from
cv2.findContours()), it computes the moments for that specific contour (i.e., one object).
What is template matching and how do you use it?
- Finding a small image (template) within a larger image (search area) by sliding the template across the larger image and comparing pixel values at each location.
result = cv2.matchTemplate(image, template, method)resultis matrix where each pixel represents how well the template matches at that location, extract the min/max value and its location to find the best match
| Method | Description | Limitations/notes |
|---|---|---|
cv2.TM_SQDIFF | Squared difference | Sensitive to brightness |
cv2.TM_SQDIFF_NORMED | Normalized squared difference | All the normed versions are better for varying lighting |
cv2.TM_CCORR | Cross-correlation | Sensitive to overall intensity |
cv2.TM_CCORR_NORMED | Normalized cross-correlation | Usually best |
cv2.TM_CCOEFF | Correlation coefficient | Can fail with uniform regions |
cv2.TM_CCOEFF_NORMED | Normalized correlation coefficient |
- Sensitive to scale, rotation and lighting changes (won’t match), usually not suited for real-world complex scenes
What are basic drawing operations?
- OpenCV lets you draw directly onto images using basic geometric shapes and text
- Basic drawing functions
cv2.line(img, pt1, pt2, color, thickness)cv2.rectangle(img, pt1, pt2, color, thickness)cv2.circle(img, center, radius, color, thickness)cv2.putText(img, text, org, font, fontScale, color, thickness, lineType)cv2.polylines(img, [pts], isClosed=True, color=(0, 255, 0), thickness=2)cv2.drawContours(img, contours, -1, (255, 255, 0), 2)
- Drawing modifies the image you pass in, so if you want to preserve the original, make a copy
img_copy = img.copy()
What is histogram equalization and why is it useful?
- Contrast enhancement technique that calculates histogram of input image (nb of pixel at each intensity), redistributes the intensities so that the histogram is more balanced and stretches out regions where pixel intensities are clumped together
cv2.equalizeHist(gray)- Can amplify noise and does not preserve local detail
- Adaptative histogram equalisation (CLAHE)
- Divides the image into small tiles and equalizes each tile separately -> reduce noise boosting
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))equalized = clahe.apply(gray)
- Useful preprocessing step to make features more distinguishable in low contrast images
How would you resize an image while maintaining its aspect ratio?
- OpenCV expects you to provide target width and height of the resized image.
resized = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_AREA)
- To preserve aspect ratio, resize by width or height: choose target width or height and scale the other according to scaling factor original_width/target_width
Junior questions
How would you detect lines in an image?
Hough Transform detects straight lines in an edge-detected image by voting for potential lines, transforms each point in the image into a set of possible lines and finds the most consistent (i.e., voted) ones.
But Hough uses polar coordinates, have to convert to cartesian when drawing
edges = cv2.Canny(gray_img, 50, 150)
lines = cv2.HoughLines(edges, rho=1, theta=np.pi/180, threshold=100)
if lines is not None:
for line in lines:
rho, theta = line[0]
a = np.cos(theta)
b = np.sin(theta)
x0 = a * rho
y0 = b * rho
# Convert polar to Cartesian line endpoints
x1 = int(x0 + 1000 * (-b))
y1 = int(y0 + 1000 * (a))
x2 = int(x0 - 1000 * (-b))
y2 = int(y0 - 1000 * (a))
cv2.line(img, (x1, y1), (x2, y2), (0, 0, 255), 2)
How would you detect circles in an image?
Use the Hough circle transform: extension of the hough transform used to detect lines, it detects circles by searching for groups of pixels that form circular shapes based on circle equation.
# Detect circles
circles = cv2.HoughCircles(
img,
cv2.HOUGH_GRADIENT,
dp=1.2,
minDist=30,
param1=100,
param2=30,
minRadius=10,
maxRadius=100
)
# Draw the detected circles
if circles is not None:
circles = np.uint16(np.around(circles))
for (x, y, r) in circles[0, :]:
cv2.circle(img, (x, y), r, (0, 255, 0), 2) # Outer circle
cv2.circle(img, (x, y), 2, (0, 0, 255), 3) # Center point
- minDist is minimum distance between detected centers, param1 is upper threshold for Canny edge detector, param2 is threshold for center detection (lower = more circles)
- Sensitive to noise, lighting and partial occlusions, doesn’t work for ellipses
Can you explain what a kernel is in the context of image convolution?
Small grid of numbers (typically 3×3, 5×5, or 7×7) that slides over the image and performs a convolution operation at each pixel location.
The kernel determines:
- What kind of transformation is applied (e.g., blur, detect edges)
- How each pixel’s value is changed based on its neighborhood
Convolution: multiply each value in the kernel with the corresponding pixel in the image region, sum the result, and assign it to the output pixel (repeated for each pixel in the image)
How would you rotate an image by a specific angle?
Compute the rotation matrix and apply the rotation matrix using affine transformation
cv2.getRotationMatrix2D(center, angle, scale=1.0)- center = rotation point, usually image center
rotated = cv2.warpAffine(img, M, (w, h))(w, h)sets the output size (same as original in this case)
If you want to rotate by multiples of 90deg just use cv2.rotate(src, cv2.ROTATE_90_CLOCKWISE)
What is the purpose of cv2.inRange() function and how is it commonly used?
mask = cv2.inRange(image, lower_bound, upper_bound)
Compares each pixel in the input image to a lower and upper bound, returns a binary mask:
255(white) where the pixel falls within the range0(black) where it does not
Commonly used to extract objects of a certain colour, region segmentation, …
Image processing questions
What is image segmentation is and why is it used for?
Divide an image into distinct regions based on certain criteria (colour, intensity, …) -> make it easier to understand an analyse for detection, classification, measurement
Types of segmentation
- Thresholding: separates pixels based on intensity
cv2.threshold(gray_img, 127, 255, cv2.THRESH_BINARY)
- Colour-based segmentation: use colour ranges to extract regions
cv2.inRange(hsv_image, lower_bound, upper_bound)
- Contour-based segmentation: find object outlines using
cv2.findContours() - Watershed algorithm: separating touching or overlapping objects
- Treats image like a topographic surface: pixel intensity is elevation, bright areas are peaks and dark areas are valleys
- “Floods” the image from specific markers, boundaries are built where different regions meet
- GrabCut (interactive segmentation): foreground/background segmentation with user input
cv2.grabCut(image, mask, rect, bgModel, fgModel, iterCount, mode)
Can you describe the process of histogram analysis in image processing?
Graphical representation of the distribution of pixel intensities in an image, can compute separate histogram for each channel for RGB images.
| What It Tells You | Interpretation Example |
|---|---|
| Brightness | Histogram shifted left = darker image |
| Contrast | Narrow histogram = low contrast |
| Dynamic range | Full-width histogram = good range |
| Dominant tones | Peaks at certain intensity values |
- Use cases
- Image enhancement: histogram equalisation modifies global or local contrast
- Thresholding: Otsu’s method uses histogram to find optimal threshold
- Image comparison: compare colour histograms to identify similar images
- Colour filtering: identify dominant colour ranges to create masks or segment specific regions
- Colour histogram example
channels = ('b', 'g', 'r')
for i, col in enumerate(channels):
hist = cv2.calcHist([img], [i], None, [256], [0, 256])
plt.plot(hist, color=col)
Put [0] instead of i for a grayscale image
What is feature matching and how is it performed?
Identifying corresponding points (features) between two images, relies on finding distinctive, repeatable points and describing them using feature descriptors, descriptors are then compared across images to find matches
Method
- Detect kepoints: identify interesting points in image (corners, blobs)
- Compute descriptors: describe local neighbourhood around each keypoint
- Match descriptors: compare descriptors between two images to find matching points
OpenCV feature detectors and descriptors
ORB is rotation invariant
orb = cv2.ORB_create()
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)
Matching techniques
Brute-force matcher `BFMatcher` compares every descriptor in one image to every descriptor in another
bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True) # For ORB
matches = bf.match(des1, des2)
matches = sorted(matches, key=lambda x: x.distance)
matched_img = cv2.drawMatches(img1, kp1, img2, kp2, good[:50], None, flags=2)
cv2.imshow("Matches", matched_img)
Useful for 3d reconstruction: match features across views for triangulation
What is the role of image pyramids?
- Image pyramid: a collection of images derived from a single source image, where
- Each level in the pyramid is a lower-resolution version of the previous one
- Resolution typically reduced by a factor of 2 at each level
- Gaussian pyramid: successively blurred and downsampled (reduce resolution by half) versions
lower_res = cv2.pyrDown(image)→ downscale image by halfhigher_res = cv2.pyrUp(lower_res)→ upscale image by 2× (not the same as original)
- Laplacian pyramid: stores difference between levels of a gaussian pyramid, captures the detail lost during downsampling
gaussian_down = cv2.pyrDown(image)
gaussian_up = cv2.pyrUp(gaussian_down)
laplacian = cv2.subtract(image, gaussian_up)
Can use them when trying to match something at different scales (build a pyramid of the template and try to match it at different resolutions)
Algorithms questions
Explain the concept of camera calibration and how it’s performed
Camera calibration estimates the internal characteristics of a camera (intrinsic parameters: focal length, distorsion coeffs) and how it’s positioned in space (extrinsic parameters: rotation and translation relative to scene) to remove lens distorsion and map 2D image points to 3D real-world coordinates (reconstruct 3D scenes from multiple views, improve accuracy in 3D scanning).
Process
- Capture multiple images of a chessboard from different angles and distances
- Detect corners of chessboard with
cv2.findChessboardCorners() - Prepare object points (known 3D coordinates) and image points (detected 3D corners in each image)
- Calibrate the camera with
-
ret, camera_matrix, dist_coeffs, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, image_size, None, None) camera_matrix: contains focal lengths and optical centerdist_coeffs: distortion coefficients (radial & tangential)
-
- Undistort future images
undistorted = cv2.undistort(img, camera_matrix, dist_coeffs)- Basically remove lens distorsions like barrel distortion, pincushion distortion, tangential distortion, line appear straight after removing distortion
Explain the concept of non-maximum suppression in the context of object detection
Non-Maximum Supression (NMS) is a post-processing step in object detection. Object detectors often detect the same object multiple times, outputting overlapping bounding boxes with different confidence scores -> NMS keeps only the one with highest confidence score and removes the rest. It eliminates duplicate detection of the same object.
indices = cv2.dnn.NMSBoxes(boxes, confidences, score_thresh, nms_thresh)
How would you approach the task of 3D reconstruction from multiple 2D images using OpenCV?
- Goal: estimate 3D coordinates of points in the real world using their 2D projections in two or more images.
- Key concept: triangulation: if you observe a point from at least two different angles, you can triangulate its position in 3D space
- Camera calibration
- To get intrinsic parameters and distortion coefficients
ret, K, dist, rvecs, tvecs = cv2.calibrateCamera(...)
- Detect and match features
- Use feature detectors (SIFT, ORB, etc.) to find and match keypoints between images.
kp1, des1 = sift.detectAndCompute(img1, None)kp2, des2 = sift.detectAndCompute(img2, None)matches = bf.match(des1, des2)
- Use feature detectors (SIFT, ORB, etc.) to find and match keypoints between images.
- Estimate fundamental matrix
- Describes geometric relationship between the images
F, mask = cv2.findFundamentalMat(pts1, pts2, method=cv2.FM_RANSAC)- Or if you know the camera’s intrinsics
E, _ = cv2.findEssentialMat(pts1, pts2, K)
- Recover camera pose
_, R, t, mask = cv2.recoverPose(E, pts1, pts2, K)
- Triangulate points
- With the relative camera poses and matched points, you can reconstruct the 3D points.
proj1 = K @ np.hstack((np.eye(3), np.zeros((3,1)))) # First camera matrix
proj2 = K @ np.hstack((R, t)) # Second camera matrix
points_4d = cv2.triangulatePoints(proj1, proj2, pts1.T, pts2.T)
points_3d = points_4d[:3] / points_4d[3] # Convert from homogeneous to 3D
Advanced questions
How would you approach improving the performance of an existing OpenCV application that is running slowly?
- Profile, don’t guess (
timeit) - Reduce image size before heavy computation if possible
small = cv2.resize(frame, (width // 2, height // 2))
- Avoid recalculating things unnecessarily
- Cache constant results (kernels), precompute masks or lookup tables, reuse results across frames when possible
- Use vectorized or built-in cv2 functions (instead of loops)
- Apply region of interest (ROI): if you’re only interested in part of the image (like a face or license plate), crop it and process only that region
roi = frame[y:y+h, x:x+w]
- Use efficient algorithms, swap out slow algos for faster ones
| Task | Slow | Faster Alternative |
|---|---|---|
| Feature detection | SIFT/SURF | ORB or AKAZE |
| Background subtraction | MOG2 | KNN or custom thresholding |
| Dense optical flow | Farneback | Lucas-Kanade (sparse) |
- Use OpenCV with GPU (if available)
- To look into
- Batch or Approximate Expensive Work
- Don’t run detection every frame — use every Nth frame
- Approximate with faster methods if precision isn’t critical
- Use Efficient File I/O
- Load and save images using OpenCV (not PIL or other slower libs)
- Minimize disk I/O in loops
Suppose you encounter a situation where the image quality is poor due to low lighting. What techniques would you use to enhance the image for better analysis?
- Histogram equalisation or CLAHE (Contrast Limited Adaptive Histogram Equalization): enhance contrast by spreading out pixel intensities, CLAHE if uneven lighting
- Gamma Correction: brightens image non-linearly. Useful when image is very dark but not noisy
- Denoising: low light often increases sensor noise
cv2.fastNlMeansDenoisingColored(image, None, 10, 10, 7, 21)
- Bilateral Filtering: optional, but helps in smoothing without losing details:
If you find that your image segmentation results are not satisfactory, what strategies would you use to troubleshoot and refine your approach?
- Look at your results, what is not satisfactory? Look at your input data, what could improve its clarity? Look at your different stages, when does it start looking bad?
- Improve preprocessing: denoising, histogram eq, colour space conversion
- Fine tuning parameters and thresholds
- Morphological operations
- Try different segmentation techniques
- Consider switching to DL
How would you approach integrating OpenCV with other machine learning frameworks for a comprehensive project?
- OpenCV + PyTorch
- Preprocessing: Use OpenCV to load and preprocess images, convert to PyTorch tensors.
- Postprocessing: Use OpenCV to display model output (e.g. draw boxes for object detection).
- Example: Real-time object detection on webcam using OpenCV + PyTorch:
- Use OpenCV to access webcam and preprocess frames.
- Run inference using a PyTorch model.
- Use OpenCV to draw results (e.g., bounding boxes, labels).
- Preprocessing with OpenCV vs torchvision
- Reasons to Use OpenCV for Preprocessing
- Performance (Especially on CPU)
- OpenCV is implemented in C/C++ under the hood and is highly optimized for image I/O and manipulation on CPU.
- For large-scale or real-time applications (like video frames), OpenCV tends to be faster than torchvision.transforms, especially for tasks like resizing, blurring, or color space conversion.
- More Versatile and Feature-Rich
- OpenCV supports a broader range of image processing operations, you can use it for classic vision tasks like contour detection
- Gives more low level control over images
- You can fine-tune resizing (e.g. interpolation type), manually handle color spaces, or crop with pixel-level precision
- Performance (Especially on CPU)
- When you might prefer torchvision
- Some transforms can be GPU accelerated, useful for data augmentation
- Reasons to Use OpenCV for Preprocessing
Structure from Motion (SfM)
- Goal: recover 3D geometry from 2D images
- Common solution is triangulation: use corresponding image points in multiple views, important prerequisite is determination of camera calibration and position (projection matrix)
- SfM algos allow simultaneous computation of projection matrices and 3D points using corresponding points in each view
- Given [math]n[/math] projected points [math]u_{ij}[/math] with [math]i \in {1 . . . m}[/math] and [math]j \in∈ {1 . . . n}[/math] in [math]m[/math] images, the goal is to find both projection matrices [math]P_1, …, P_m[/math] and a consistent 3D structure [math]X_1, …, X_n[/math].
Process
- Feature extraction
- Detect a number of key points in each image, 8 minimum, usually corners
- Feature matching
- Match each key point to its equivalent in each point of view
- Template matching, optical flow, …
- 3D reconstruction
- When you look at a 3D scene with two cameras from two different views, 3D point projects to a 2D point in each image. These 2D points lie along known epipolar lines
- All such corresponding points must satisfy an equation involving the fundamental matrix F:
- A 3×3 matrix that encodes the epipolar geometry between two uncalibrated cameras
- If x_1 and x_2 are corresponding points in image 1 and image 2, then: x_2^T \cdot F \cdot x_1 = 0
- the point in image 2 lies on the epipolar line computed from the point in image 1
- So what you want to do to 3D reconstruct:
- Compute the fundamental matrix F
- Decompose F into the projection matrices of cameras 1 and 2
- Triangulate points using the 2D points and camera matrices
- Bundle adjustment
- Minimizing a cost function that is related to a weighted sum of squared reprojection errors of the projection of the computed 3D points and their multi-view original image points.
- Filter out inconsistent 3D points by detecting their reprojection errors as outliers
Public repos: Open SfM and Colmap
Multi-View Stereo (MVS)
TODO
Performances improvements
- Avoid using loops in Python as much as possible, especially double/triple loops etc. They are inherently slow.
- Vectorise the algorithm/code to the maximum extent possible, because Numpy and OpenCV are optimized for vector operations.
- Exploit the cache coherence.
- Never make copies of an array unless it is necessary. Try to use views instead. Array copying is a costly operation.
- Python map function and list comprehension are faster than basic for loops
# Loop version
newlist = []
for word in oldlist:
newlist.append(word.upper())
# Map version instead
newlist = map(str.upper, oldlist)
# List comprehension version instead
newlist = [s.upper() for s in oldlist]
- Data aggregation because of function call overhead
x = 0
def doit1(i):
global x
x t = time.time()= x + i
list = range(100000)
for i in list:
doit1(i)
# Faster version
x = 0
def doit2(list):
global x
for i in list:
x = x + i
list = range(100000)
doit2(list)
OpenCV vs Pillow vs Scikit Image
- Use Pillow if you’re doing lightweight, clean image editing (e.g. web apps, thumbnails).
- Use OpenCV for performance-heavy or vision-heavy work (e.g. object detection, tracking, real-time processing).
- Use skimage for research, education, and NumPy-integrated scientific image processing.
OpenCV and skimage both have np compatibility (treat images as numpy ndarrays), Pillow does not.
Computer Vision Data Augmentations
Geometric data augmentations
Help the model become invariant to position and orientation
| Augmentation | Description |
|---|---|
| Flip | Horizontal/vertical mirroring |
| Rotation | Rotate image by small angles (e.g. ±15°) |
| Scaling | Resize image, optionally keeping aspect ratio |
| Translation | Shift image in x and/or y direction |
| Cropping | Random or center crops (useful for zoom or context variation) |
| Shearing | Slant the image along an axis |
Color & Lighting Adjustments
Useful for natural images where lighting varies
| Augmentation | Description |
|---|---|
| Brightness | Lighten or darken image |
| Contrast | Enhance or reduce contrast |
| Saturation | Modify color intensity |
| Hue adjustment | Shift color tones |
| Color jittering | Random combo of the above |
Noise and blur
Helps with robustness to camera quality and real-world conditions
| Augmentation | Description |
|---|---|
| Gaussian noise | Add small pixel-wise noise |
| Salt and pepper | Random black/white pixels |
| Gaussian blur | Slight blurring to simulate focus loss |
| Motion blur | Mimic camera or object motion |
Occlusion and cutout
Teaches the model not to depend on any one region of the image.
| Augmentation | Description |
|---|---|
| Cutout / Random Erasing | Black out a random square patch |
| Random occlusion | Simulate objects partially hidden |
| Grid mask | Overlay mask with missing patches |
Synthetic data
Great for robustness and data diversity.
| Augmentation | Description |
|---|---|
| Mixup | Combine two images and labels by blending |
| CutMix | Paste a patch from one image into another |
| Style transfer | Alter texture while keeping structure |
| GAN-based augmentation | Generate synthetic images from real samples |