Skip to content

gpu_monitoring

Documentation for GPU Selection Methods

select_gpu

Functionality

This function selects the most suitable GPU based on the current load. It first tries to find the GPU with the lowest memory usage. If that fails, it falls back to a round-robin selection.

Parameters

  • None

Usage

Purpose: Choose a GPU for computation based on current load and memory usage.

Example

import torch
gpu_id = select_gpu()
device = torch.device(f"cuda:{gpu_id}")

get_least_loaded_gpu

Functionality

Retrieves the GPU index with the lowest memory used. This function fetches GPU memory usage by executing nvidia-smi and returns the GPU ID that has the least memory load.

Parameters

This function does not take any parameters.

Usage

Use this function to select an optimal GPU for compute tasks. It is particularly useful when multiple GPUs are available.

Example

gpu_index = get_least_loaded_gpu()
print(f"Selected GPU: {gpu_index}")

get_next_gpu

Functionality

This function cycles through available GPUs using a round-robin algorithm. It maintains a global counter and returns the next GPU ID by incrementing the last used GPU index and wrapping around if needed.

Parameters

This function does not take any parameters.

Return Value

  • Returns an integer representing the ID of the next GPU to use.

Usage

Call this function to distribute workloads evenly across multiple GPUs in a cyclic manner.

Example

import torch
from embedding_studio.utils.gpu_monitoring import get_next_gpu

gpu_id = get_next_gpu()
print("Using GPU:", gpu_id)

select_device

Functionality

Selects the most suitable device for computation. This function checks if a GPU is available and returns a torch.device instance. It selects a GPU with the lowest load if available; otherwise, it falls back to the CPU.

Parameters

This function does not require any parameters.

Usage

Purpose: To automatically choose the best computation device for deep learning tasks.

Example

import torch
from embedding_studio.utils.gpu_monitoring import select_device

device = select_device()
print("Using device:", device)