bonni.OptimConfig#

class bonni.OptimConfig(*, total_steps, warmup_steps=None, peak_lr=0.001, final_lr=1e-09, init_lr=1e-05, fixed_lr=None, clip_grad_norm=1.0, use_adamw=True)[source]#

Configuration for training the ensemble between every sampling step.

This configuration manages the optimizer settings and the learning rate schedule (typically a warmup followed by a decay).

total_steps#

The total number of optimization steps to perform in this training phase.

Type:: int

warmup_steps#

The number of steps at the start of training during which the learning rate increases linearly from init_lr to peak_lr. If fixed_lr is not specified, then this parameter is required. Defaults to None.

Type:: int | None

peak_lr#

The maximum learning rate reached after the warmup phase is complete. Defaults to 1e-9.

Type:: float

final_lr#

The final learning rate at the end of total_steps. The schedule typically decays from peak_lr to this value. Defaults to 1e-3.

Type:: float

init_lr#

The initial learning rate at step 0, before warmup begins. Defaults to 1e-5.

Type:: float

fixed_lr#

If provided, overrides the scheduling logic (warmup/peak/final) and uses this constant learning rate for all steps. Defaults to None.

Type:: float | None

clip_grad_norm#

The maximum norm for gradient clipping. If gradients exceed this norm, they are rescaled. Set to None to disable clipping. Defaults to 1.0.

Type:: float | None

use_adamw#

Whether to use the AdamW optimizer. If False, standard Adam without weight decay is used. Defaults to True.

Type:: bool

__init__(*, total_steps, warmup_steps=None, peak_lr=0.001, final_lr=1e-09, init_lr=1e-05, fixed_lr=None, clip_grad_norm=1.0, use_adamw=True)#

Methods

__init__(*, total_steps[, warmup_steps, ...])

Attributes

`clip_grad_norm`
`final_lr`
`fixed_lr`
`init_lr`
`peak_lr`
`use_adamw`
`warmup_steps`
`total_steps`

bonni.OptimConfig

Contents

bonni.OptimConfig#