Analyse the performance and profiling of a GPU when running a specific application:
- Find out how the app uses the GPU
- Identify performance bottlenecks
- Optimise the code for better efficiency and speed
Metrics
- Frame rate (FPS): how many frames per second the GPU is rendering
- GPU usage: how much of the GPU’s processing power is being used. Too low = GPU underused, too high = point to potential bottlenecks
- Memory usage: how much of the GPU’s memory is being used by the application
- Shader performance: how efficient shader programs execute on the GPU
- Instruction count: number of instructions a shader executes
- Execution time: time it takes for a shader to complete its tasks
- Throughput: number of operations a shader can perform per second
- Pipeline stalls: situations where the GPU must wait for a shader to complete before moving on to the next task
- Texture fetches: how often a shader accesses textures from memory
- Compute workload: performance of general-purpose computing tasks on the GPU
- Thread usage
- Memory access patterns: how data is read from and written to memory
- Latency: time it takes for a compute shader to start and complete a task
- Occupancy: ratio of active warps (groups of threads) to the maximum possible number of warps that can run on the GPU, high occupancy = GPU used effectively
- Synchronisation overhead: overhead when different parts of the computation may need to synchronise with each other
- Bandwidth utilization: how effectively data is being transferred between the GPU and other components
- Thermal and power metrics: how much power the GPU is consuming and how it affects the GPU’s temperature, high temperatures = throttling
Steps
- Understand which parts of your application are GPU-intensive: use Nsight Systems to find out whether you are CPU bound or GPU bound:
- CPU bound: not able to issue enough work to the GPU to take full advantage of its full processing power
- GPU bound: GPU not able to process the work it is issued fast enough
- Use profiling tools to capture data while the application is running: metrics like GPU usage, memory usage, and frame rate
- Examine the captured data to identify areas where the GPU is not being used efficiently: inefficient shaders, memory bottlenecks, or poor bandwidth usage
- Make changes to your code to improve performance: optimising shaders, reducing memory usage, or better balancing the workload between the CPU and GPU
- Re-profile and Iterate
Tools
- NVIDIA Nsight
- AMD Radeon GPU Profiler (RGP)
- Intel GPA (Graphics Performance Analyzers)
- RenderDoc: frame-capture based tool to analyse the rendering pipeline of an application
NVIDIA Nsight Graphics
TODO
Standalone developer tool with ray-tracing support that enables you to debug, profile, and export frames built with Direct3D, Vulkan, OpenGL, OpenVR, and the Oculus SDK.
Frame Debugger
Useful when you are CPU bound
GPU Trace
Offers a deep analysis of your SM’s performance by tracing the execution of your shaders on the SM across a series of frames
NVIDIA Nsight Systems
TODO
System-wide performance analysis tool designed to visualise an application’s algorithms, help select the largest opportunities to optimise, and tune to scale efficiently across any quantity of CPUs and GPUs.