bonni.OptimConfig#
- class bonni.OptimConfig(*, total_steps, warmup_steps=None, peak_lr=0.001, final_lr=1e-09, init_lr=1e-05, fixed_lr=None, clip_grad_norm=1.0, use_adamw=True)[source]#
Configuration for training the ensemble between every sampling step.
This configuration manages the optimizer settings and the learning rate schedule (typically a warmup followed by a decay).
- total_steps#
The total number of optimization steps to perform in this training phase.
- Type:
int
- warmup_steps#
The number of steps at the start of training during which the learning rate increases linearly from init_lr to peak_lr. If fixed_lr is not specified, then this parameter is required. Defaults to None.
- Type:
int | None
- peak_lr#
The maximum learning rate reached after the warmup phase is complete. Defaults to 1e-9.
- Type:
float
- final_lr#
The final learning rate at the end of total_steps. The schedule typically decays from peak_lr to this value. Defaults to 1e-3.
- Type:
float
- init_lr#
The initial learning rate at step 0, before warmup begins. Defaults to 1e-5.
- Type:
float
- fixed_lr#
If provided, overrides the scheduling logic (warmup/peak/final) and uses this constant learning rate for all steps. Defaults to None.
- Type:
float | None
- clip_grad_norm#
The maximum norm for gradient clipping. If gradients exceed this norm, they are rescaled. Set to None to disable clipping. Defaults to 1.0.
- Type:
float | None
- use_adamw#
Whether to use the AdamW optimizer. If False, standard Adam without weight decay is used. Defaults to True.
- Type:
bool
- __init__(*, total_steps, warmup_steps=None, peak_lr=0.001, final_lr=1e-09, init_lr=1e-05, fixed_lr=None, clip_grad_norm=1.0, use_adamw=True)#
Methods
__init__(*, total_steps[, warmup_steps, ...])Attributes