Random notes

OpenCV uses BGR instead of RGB, need to convert if using different libraries
OpenCV coord system on an image originates at top left corner

Basic Questions

What is OpenCV and why is it used?

Open-source library for real-time computer vision
Used for image processing, video processing
Cross-platform and can be used in Python, C++, Java and MATLAB
Efficient (can be used for real-time) and extensive function library
Real world uses
- Security and surveillance
- Industrial automation: visual inspection systems
- Healthcare and medical imaging: anomalies detection in imaging
- Robotics and drones: SLAM
- Augmented reality: detect AR markers

How would you load and display an image

Load

imread()with path to the file, URL or bytestring (see colab)

Error handling: it will return None if the image could not be found, check for that
Flags: can read the image in colour, grayscale, with alpha channel, … Changes how the image will be interpreted

Display

imshow()with image as argument to display it

Has to be followed by waitKey()to keep the window open
- waitKey(0) waits indefinitely for a key pressed
- waitKey(>0) waits for that many milliseconds
- waitKey(1) used in loops, for non-blocking behaviour, window does not freeze and still responds to inputs
Follow again with destroyAllWindows() to close it properly

What is image thresholding and how do you use it?

Simple image segmentation method based on pixel intensity: creates a binary (black and white) image from a grayscale image where all pixels above threshold are white and all pixels below threshold are black
- Pixel intensity: grayscale value of the pixel (0 to 255)
Used for separating foreground from background, or isolating objects of interest
Useful when there’s high contrast between object and background
Simple thresholding
- cv2.threshold(img, thresh, maxval, type)
- Same threshold value to every pixel in the image
- Useful for even lighting, strong contrast between foreground and background
- Fails under uneven lighting (shadows, highlights)
Adaptative thresholding
- cv2.adaptiveThreshold(src, maxValue, adaptiveMethod, thresholdType, blockSize, C)
- Each pixel gets its own threshold based on the local neighborhood.
- Useful for non-uniform lighting, documents with shadows, gradients or smudges
- Can introduce noise in very uniform images
Choosing appropriate threshold value: Otsu’s method
- ret2,th2 = cv.threshold(img, 0, 255, cv.THRESH_BINARY+cv.THRESH_OTSU)
- Computes an optimal threshold that minimizes intra-class variance (difference within foreground and background pixels)

How would you detect edges in an image

Edge detection is finding boundaries or transitions in an image: areas where pixel intensity changes sharply
OpenCV methods are gradient-based
- Sobel: finds gradient in x or y direction using convolution with small kernels
- Laplacian: finds areas of rapid intensity change, measures second derivative (how fast gradient is changing)
- Canny: multi-stage: applies gaussian blur, uses Sobel to compute gradients, thin out edges to 1-pixel width, thresholds weak and strong edges
  - Usually preferred, gives better results and less susceptible to noise
  - Based on two thresholds: edges below t1 are discarded, edges between t1 and t2 are weak edges (kept only if connected to strong ones), edges above t2 are strong edges (thresholds on gradient magnitude)
Steps: convert image to grayscale, reduce noise (Gaussian blur), apply edge detection
Useful for object detection, image segmentation

What is image blurring and why is it useful?

Reduce noise and details in an image, convolve image with a low-pass filter kernel
Gaussian blur: apply a gaussian kernel to the image, nearby pixels are weighted based on a normal distribution (closer pixels contribute more)
Median blur: replaces each pixel with median value in its neighbourhood
- Especially good for salt-and-pepper noise
Bilateral filtering: weighted average of the neighbouring pixels but takes into account that close pixels get higher weight, pixels with similar intensity gets higher weight, pixels that are too different in intensity get very little weight -> keeps edges sharp
- Best as preserving edges while smoothing
Useful as preprocessing for a lot of tasks, smoothes out small details and minor variations that can interfere with larger scale analysis
How choice of kernel affects the result
- Larger kernel = better noise removal but more loss of detail
- Method and kernel size changes trade off between noise reduction and loss of detail

What are image moments and how are they used?

Scalar values that summarise the spatial distribution of pixel intensities in the image
- Capture area, centroid, orientation, symmetry of shapes in the image
cv2.moments(binary_image_or_contour) returns a dictionary of moment values
- Spatial moments mij
  - Area m00 (for binary images it’s just the number of white pixels)
  - values to compute the centroid m10 and m01
- Central moments muij
  - Measure shape’s distribution relative to its centroid, they are translation invariant (do not change it object moves in the image)
  - Useful to compute orientation, compare shape regardless of location
- Normalised central moments nuij
  - Translation and scale invariant
  - Useful to compare shape at different sizes
Hu moments (shape descriptors)
- 7 values derived from normalised central moments
- Translation, scale and rotation invariant
- cv2.HuMoments(M)
- Useful for shape matching and recognition
If you give cv2.moments() a binary image, it treats all white pixels (non-zero) as a single shape/region.
If you give it a contour (like one item from cv2.findContours()), it computes the moments for that specific contour (i.e., one object).

What is template matching and how do you use it?

Finding a small image (template) within a larger image (search area) by sliding the template across the larger image and comparing pixel values at each location.
result = cv2.matchTemplate(image, template, method)
- result is matrix where each pixel represents how well the template matches at that location, extract the min/max value and its location to find the best match

Method	Description	Limitations/notes
`cv2.TM_SQDIFF`	Squared difference	Sensitive to brightness
`cv2.TM_SQDIFF_NORMED`	Normalized squared difference	All the normed versions are better for varying lighting
`cv2.TM_CCORR`	Cross-correlation	Sensitive to overall intensity
`cv2.TM_CCORR_NORMED`	Normalized cross-correlation	Usually best
`cv2.TM_CCOEFF`	Correlation coefficient	Can fail with uniform regions
`cv2.TM_CCOEFF_NORMED`	Normalized correlation coefficient

Sensitive to scale, rotation and lighting changes (won’t match), usually not suited for real-world complex scenes

What are basic drawing operations?

OpenCV lets you draw directly onto images using basic geometric shapes and text
Basic drawing functions
- cv2.line(img, pt1, pt2, color, thickness)
- cv2.rectangle(img, pt1, pt2, color, thickness)
- cv2.circle(img, center, radius, color, thickness)
- cv2.putText(img, text, org, font, fontScale, color, thickness, lineType)
- cv2.polylines(img, [pts], isClosed=True, color=(0, 255, 0), thickness=2)
- cv2.drawContours(img, contours, -1, (255, 255, 0), 2)
Drawing modifies the image you pass in, so if you want to preserve the original, make a copy img_copy = img.copy()

What is histogram equalization and why is it useful?

Contrast enhancement technique that calculates histogram of input image (nb of pixel at each intensity), redistributes the intensities so that the histogram is more balanced and stretches out regions where pixel intensities are clumped together
cv2.equalizeHist(gray)
Can amplify noise and does not preserve local detail
Adaptative histogram equalisation (CLAHE)
- Divides the image into small tiles and equalizes each tile separately -> reduce noise boosting
- clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
- equalized = clahe.apply(gray)
Useful preprocessing step to make features more distinguishable in low contrast images

How would you resize an image while maintaining its aspect ratio?

OpenCV expects you to provide target width and height of the resized image.

resized = cv2.resize(img, (new_width, new_height), interpolation=cv2.INTER_AREA)

To preserve aspect ratio, resize by width or height: choose target width or height and scale the other according to scaling factor original_width/target_width

Junior questions

How would you detect lines in an image?

Hough Transform detects straight lines in an edge-detected image by voting for potential lines, transforms each point in the image into a set of possible lines and finds the most consistent (i.e., voted) ones.

But Hough uses polar coordinates, have to convert to cartesian when drawing

edges = cv2.Canny(gray_img, 50, 150)
lines = cv2.HoughLines(edges, rho=1, theta=np.pi/180, threshold=100)
if lines is not None:
    for line in lines:
        rho, theta = line[0]
        a = np.cos(theta)
        b = np.sin(theta)
        x0 = a * rho
        y0 = b * rho
        # Convert polar to Cartesian line endpoints
        x1 = int(x0 + 1000 * (-b))
        y1 = int(y0 + 1000 * (a))
        x2 = int(x0 - 1000 * (-b))
        y2 = int(y0 - 1000 * (a))
        cv2.line(img, (x1, y1), (x2, y2), (0, 0, 255), 2)

How would you detect circles in an image?

Use the Hough circle transform: extension of the hough transform used to detect lines, it detects circles by searching for groups of pixels that form circular shapes based on circle equation.

# Detect circles
circles = cv2.HoughCircles(
    img,
    cv2.HOUGH_GRADIENT,
    dp=1.2,
    minDist=30,
    param1=100,
    param2=30,
    minRadius=10,
    maxRadius=100
)

# Draw the detected circles
if circles is not None:
    circles = np.uint16(np.around(circles))
    for (x, y, r) in circles[0, :]:
        cv2.circle(img, (x, y), r, (0, 255, 0), 2)      # Outer circle
        cv2.circle(img, (x, y), 2, (0, 0, 255), 3)      # Center point

minDist is minimum distance between detected centers, param1 is upper threshold for Canny edge detector, param2 is threshold for center detection (lower = more circles)
Sensitive to noise, lighting and partial occlusions, doesn’t work for ellipses

Can you explain what a kernel is in the context of image convolution?

Small grid of numbers (typically 3×3, 5×5, or 7×7) that slides over the image and performs a convolution operation at each pixel location.

The kernel determines:

What kind of transformation is applied (e.g., blur, detect edges)
How each pixel’s value is changed based on its neighborhood

Convolution: multiply each value in the kernel with the corresponding pixel in the image region, sum the result, and assign it to the output pixel (repeated for each pixel in the image)

How would you rotate an image by a specific angle?

Compute the rotation matrix and apply the rotation matrix using affine transformation

cv2.getRotationMatrix2D(center, angle, scale=1.0)
- center = rotation point, usually image center
rotated = cv2.warpAffine(img, M, (w, h))
- (w, h) sets the output size (same as original in this case)

If you want to rotate by multiples of 90deg just use cv2.rotate(src, cv2.ROTATE_90_CLOCKWISE)

What is the purpose of cv2.inRange() function and how is it commonly used?

mask = cv2.inRange(image, lower_bound, upper_bound)

Compares each pixel in the input image to a lower and upper bound, returns a binary mask:

255 (white) where the pixel falls within the range
0 (black) where it does not

Commonly used to extract objects of a certain colour, region segmentation, …

Image processing questions

What is image segmentation is and why is it used for?

Divide an image into distinct regions based on certain criteria (colour, intensity, …) -> make it easier to understand an analyse for detection, classification, measurement

Types of segmentation

Thresholding: separates pixels based on intensity
- cv2.threshold(gray_img, 127, 255, cv2.THRESH_BINARY)
Colour-based segmentation: use colour ranges to extract regions
- cv2.inRange(hsv_image, lower_bound, upper_bound)
Contour-based segmentation: find object outlines using cv2.findContours()
Watershed algorithm: separating touching or overlapping objects
- Treats image like a topographic surface: pixel intensity is elevation, bright areas are peaks and dark areas are valleys
- “Floods” the image from specific markers, boundaries are built where different regions meet
GrabCut (interactive segmentation): foreground/background segmentation with user input
- cv2.grabCut(image, mask, rect, bgModel, fgModel, iterCount, mode)

Can you describe the process of histogram analysis in image processing?

Graphical representation of the distribution of pixel intensities in an image, can compute separate histogram for each channel for RGB images.

What It Tells You	Interpretation Example
Brightness	Histogram shifted left = darker image
Contrast	Narrow histogram = low contrast
Dynamic range	Full-width histogram = good range
Dominant tones	Peaks at certain intensity values

Use cases
- Image enhancement: histogram equalisation modifies global or local contrast
- Thresholding: Otsu’s method uses histogram to find optimal threshold
- Image comparison: compare colour histograms to identify similar images
- Colour filtering: identify dominant colour ranges to create masks or segment specific regions
Colour histogram example

channels = ('b', 'g', 'r')
for i, col in enumerate(channels):
    hist = cv2.calcHist([img], [i], None, [256], [0, 256])
    plt.plot(hist, color=col)

Put [0] instead of i for a grayscale image

What is feature matching and how is it performed?

Identifying corresponding points (features) between two images, relies on finding distinctive, repeatable points and describing them using feature descriptors, descriptors are then compared across images to find matches

Method

Detect kepoints: identify interesting points in image (corners, blobs)
Compute descriptors: describe local neighbourhood around each keypoint
Match descriptors: compare descriptors between two images to find matching points

OpenCV feature detectors and descriptors

ORB is rotation invariant

orb = cv2.ORB_create()
kp1, des1 = orb.detectAndCompute(img1, None)
kp2, des2 = orb.detectAndCompute(img2, None)

Matching techniques

Brute-force matcher `BFMatcher` compares every descriptor in one image to every descriptor in another

bf = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)  # For ORB
matches = bf.match(des1, des2)
matches = sorted(matches, key=lambda x: x.distance)

matched_img = cv2.drawMatches(img1, kp1, img2, kp2, good[:50], None, flags=2)
cv2.imshow("Matches", matched_img)

Useful for 3d reconstruction: match features across views for triangulation

What is the role of image pyramids?

Image pyramid: a collection of images derived from a single source image, where
- Each level in the pyramid is a lower-resolution version of the previous one
- Resolution typically reduced by a factor of 2 at each level

Gaussian pyramid: successively blurred and downsampled (reduce resolution by half) versions
- lower_res = cv2.pyrDown(image) → downscale image by half
- higher_res = cv2.pyrUp(lower_res) → upscale image by 2× (not the same as original)
Laplacian pyramid: stores difference between levels of a gaussian pyramid, captures the detail lost during downsampling

gaussian_down = cv2.pyrDown(image)
gaussian_up = cv2.pyrUp(gaussian_down)
laplacian = cv2.subtract(image, gaussian_up)

Can use them when trying to match something at different scales (build a pyramid of the template and try to match it at different resolutions)

Algorithms questions

Explain the concept of camera calibration and how it’s performed

Camera calibration estimates the internal characteristics of a camera (intrinsic parameters: focal length, distorsion coeffs) and how it’s positioned in space (extrinsic parameters: rotation and translation relative to scene) to remove lens distorsion and map 2D image points to 3D real-world coordinates (reconstruct 3D scenes from multiple views, improve accuracy in 3D scanning).

Process

Capture multiple images of a chessboard from different angles and distances
Detect corners of chessboard with cv2.findChessboardCorners()
Prepare object points (known 3D coordinates) and image points (detected 3D corners in each image)
Calibrate the camera with
- ret, camera_matrix, dist_coeffs, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, image_size, None, None)
- camera_matrix: contains focal lengths and optical center
- dist_coeffs: distortion coefficients (radial & tangential)
Undistort future images
- undistorted = cv2.undistort(img, camera_matrix, dist_coeffs)
- Basically remove lens distorsions like barrel distortion, pincushion distortion, tangential distortion, line appear straight after removing distortion

Explain the concept of non-maximum suppression in the context of object detection

Non-Maximum Supression (NMS) is a post-processing step in object detection. Object detectors often detect the same object multiple times, outputting overlapping bounding boxes with different confidence scores -> NMS keeps only the one with highest confidence score and removes the rest. It eliminates duplicate detection of the same object.

indices = cv2.dnn.NMSBoxes(boxes, confidences, score_thresh, nms_thresh)

How would you approach the task of 3D reconstruction from multiple 2D images using OpenCV?

Goal: estimate 3D coordinates of points in the real world using their 2D projections in two or more images.
Key concept: triangulation: if you observe a point from at least two different angles, you can triangulate its position in 3D space

Camera calibration
- To get intrinsic parameters and distortion coefficients
- ret, K, dist, rvecs, tvecs = cv2.calibrateCamera(...)
Detect and match features
- Use feature detectors (SIFT, ORB, etc.) to find and match keypoints between images.
  - kp1, des1 = sift.detectAndCompute(img1, None)
  - kp2, des2 = sift.detectAndCompute(img2, None)
  - matches = bf.match(des1, des2)
Estimate fundamental matrix
- Describes geometric relationship between the images
- F, mask = cv2.findFundamentalMat(pts1, pts2, method=cv2.FM_RANSAC)
- Or if you know the camera’s intrinsics
  - E, _ = cv2.findEssentialMat(pts1, pts2, K)
Recover camera pose
- _, R, t, mask = cv2.recoverPose(E, pts1, pts2, K)
Triangulate points
- With the relative camera poses and matched points, you can reconstruct the 3D points.

proj1 = K @ np.hstack((np.eye(3), np.zeros((3,1))))      # First camera matrix
proj2 = K @ np.hstack((R, t))                            # Second camera matrix

points_4d = cv2.triangulatePoints(proj1, proj2, pts1.T, pts2.T)
points_3d = points_4d[:3] / points_4d[3]  # Convert from homogeneous to 3D

Advanced questions

How would you approach improving the performance of an existing OpenCV application that is running slowly?

Profile, don’t guess (timeit)
Reduce image size before heavy computation if possible
- small = cv2.resize(frame, (width // 2, height // 2))
Avoid recalculating things unnecessarily
- Cache constant results (kernels), precompute masks or lookup tables, reuse results across frames when possible
Use vectorized or built-in cv2 functions (instead of loops)
Apply region of interest (ROI): if you’re only interested in part of the image (like a face or license plate), crop it and process only that region
- roi = frame[y:y+h, x:x+w]
Use efficient algorithms, swap out slow algos for faster ones

Task	Slow	Faster Alternative
Feature detection	SIFT/SURF	ORB or AKAZE
Background subtraction	MOG2	KNN or custom thresholding
Dense optical flow	Farneback	Lucas-Kanade (sparse)

Use OpenCV with GPU (if available)
- To look into
Batch or Approximate Expensive Work
- Don’t run detection every frame — use every Nth frame
- Approximate with faster methods if precision isn’t critical
Use Efficient File I/O
- Load and save images using OpenCV (not PIL or other slower libs)
- Minimize disk I/O in loops

Suppose you encounter a situation where the image quality is poor due to low lighting. What techniques would you use to enhance the image for better analysis?

Histogram equalisation or CLAHE (Contrast Limited Adaptive Histogram Equalization): enhance contrast by spreading out pixel intensities, CLAHE if uneven lighting
Gamma Correction: brightens image non-linearly. Useful when image is very dark but not noisy
Denoising: low light often increases sensor noise
- cv2.fastNlMeansDenoisingColored(image, None, 10, 10, 7, 21)
Bilateral Filtering: optional, but helps in smoothing without losing details:

If you find that your image segmentation results are not satisfactory, what strategies would you use to troubleshoot and refine your approach?

Look at your results, what is not satisfactory? Look at your input data, what could improve its clarity? Look at your different stages, when does it start looking bad?
Improve preprocessing: denoising, histogram eq, colour space conversion
Fine tuning parameters and thresholds
Morphological operations
Try different segmentation techniques
Consider switching to DL

How would you approach integrating OpenCV with other machine learning frameworks for a comprehensive project?

OpenCV + PyTorch
- Preprocessing: Use OpenCV to load and preprocess images, convert to PyTorch tensors.
- Postprocessing: Use OpenCV to display model output (e.g. draw boxes for object detection).
- Example: Real-time object detection on webcam using OpenCV + PyTorch:
  - Use OpenCV to access webcam and preprocess frames.
  - Run inference using a PyTorch model.
  - Use OpenCV to draw results (e.g., bounding boxes, labels).
Preprocessing with OpenCV vs torchvision
- Reasons to Use OpenCV for Preprocessing
  - Performance (Especially on CPU)
    - OpenCV is implemented in C/C++ under the hood and is highly optimized for image I/O and manipulation on CPU.
    - For large-scale or real-time applications (like video frames), OpenCV tends to be faster than torchvision.transforms, especially for tasks like resizing, blurring, or color space conversion.
  - More Versatile and Feature-Rich
    - OpenCV supports a broader range of image processing operations, you can use it for classic vision tasks like contour detection
  - Gives more low level control over images
    - You can fine-tune resizing (e.g. interpolation type), manually handle color spaces, or crop with pixel-level precision
- When you might prefer torchvision
  - Some transforms can be GPU accelerated, useful for data augmentation

Structure from Motion (SfM)

Goal: recover 3D geometry from 2D images
Common solution is triangulation: use corresponding image points in multiple views, important prerequisite is determination of camera calibration and position (projection matrix)
SfM algos allow simultaneous computation of projection matrices and 3D points using corresponding points in each view
- Given [math]n[/math] projected points [math]u_{ij}[/math] with [math]i \in {1 . . . m}[/math] and [math]j \in∈ {1 . . . n}[/math] in [math]m[/math] images, the goal is to find both projection matrices [math]P_1, …, P_m[/math] and a consistent 3D structure [math]X_1, …, X_n[/math].

Process

Feature extraction
- Detect a number of key points in each image, 8 minimum, usually corners
Feature matching
- Match each key point to its equivalent in each point of view
- Template matching, optical flow, …
3D reconstruction
- When you look at a 3D scene with two cameras from two different views, 3D point projects to a 2D point in each image. These 2D points lie along known epipolar lines
- All such corresponding points must satisfy an equation involving the fundamental matrix F:
  - A 3×3 matrix that encodes the epipolar geometry between two uncalibrated cameras
  - If x_1 and x_2 are corresponding points in image 1 and image 2, then: x_2^T \cdot F \cdot x_1 = 0
    - the point in image 2 lies on the epipolar line computed from the point in image 1
- So what you want to do to 3D reconstruct:
  - Compute the fundamental matrix F
  - Decompose F into the projection matrices of cameras 1 and 2
  - Triangulate points using the 2D points and camera matrices
Bundle adjustment
- Minimizing a cost function that is related to a weighted sum of squared reprojection errors of the projection of the computed 3D points and their multi-view original image points.
- Filter out inconsistent 3D points by detecting their reprojection errors as outliers

Public repos: Open SfM and Colmap

Multi-View Stereo (MVS)

TODO

Performances improvements

Avoid using loops in Python as much as possible, especially double/triple loops etc. They are inherently slow.
Vectorise the algorithm/code to the maximum extent possible, because Numpy and OpenCV are optimized for vector operations.
Exploit the cache coherence.
Never make copies of an array unless it is necessary. Try to use views instead. Array copying is a costly operation.
Python map function and list comprehension are faster than basic for loops

# Loop version
newlist = []
for word in oldlist:
    newlist.append(word.upper())

# Map version instead
newlist = map(str.upper, oldlist)

# List comprehension version instead
newlist = [s.upper() for s in oldlist]

Data aggregation because of function call overhead

x = 0

def doit1(i):
    global x
    x t = time.time()= x + i
list = range(100000)
for i in list:
    doit1(i)

# Faster version
x = 0
def doit2(list):
    global x
    for i in list:
        x = x + i
list = range(100000)
doit2(list)

OpenCV vs Pillow vs Scikit Image

Use Pillow if you’re doing lightweight, clean image editing (e.g. web apps, thumbnails).
Use OpenCV for performance-heavy or vision-heavy work (e.g. object detection, tracking, real-time processing).
Use skimage for research, education, and NumPy-integrated scientific image processing.

OpenCV and skimage both have np compatibility (treat images as numpy ndarrays), Pillow does not.

Computer Vision Data Augmentations

Geometric data augmentations

Help the model become invariant to position and orientation

Augmentation	Description
Flip	Horizontal/vertical mirroring
Rotation	Rotate image by small angles (e.g. ±15°)
Scaling	Resize image, optionally keeping aspect ratio
Translation	Shift image in x and/or y direction
Cropping	Random or center crops (useful for zoom or context variation)
Shearing	Slant the image along an axis

Color & Lighting Adjustments

Useful for natural images where lighting varies

Augmentation	Description
Brightness	Lighten or darken image
Contrast	Enhance or reduce contrast
Saturation	Modify color intensity
Hue adjustment	Shift color tones
Color jittering	Random combo of the above

Noise and blur

Helps with robustness to camera quality and real-world conditions

Augmentation	Description
Gaussian noise	Add small pixel-wise noise
Salt and pepper	Random black/white pixels
Gaussian blur	Slight blurring to simulate focus loss
Motion blur	Mimic camera or object motion

Occlusion and cutout

Teaches the model not to depend on any one region of the image.

Augmentation	Description
Cutout / Random Erasing	Black out a random square patch
Random occlusion	Simulate objects partially hidden
Grid mask	Overlay mask with missing patches

Synthetic data

Great for robustness and data diversity.

Augmentation	Description
Mixup	Combine two images and labels by blending
CutMix	Paste a patch from one image into another
Style transfer	Alter texture while keeping structure
GAN-based augmentation	Generate synthetic images from real samples