matchbench.trainer package

Submodules

matchbench.trainer.base_trainer module

class matchbench.trainer.base_trainer.EmbeddingTrainer(model, dimensions=300, window_size=3, training_algorithm='word2vec', learning_method='skipgram', workers=16, sampling_factor=0.001)

Bases: Trainer

train()

Train. :returns: best valid score.

nn.Module: best valid model.

Return type:

float

class matchbench.trainer.base_trainer.Trainer(model, args, train_dataset, valid_dataset, optimizer=None, scheduler=False, scaler=None)

Bases: object

The whole process of training. Class attributes:

  • model (nn.Module) – The model to be trained.

  • args (TrainingArguments) – Hyper parameters when training. For more details, please check the TrainingArguments class.

  • train_dataset (datasets.arrow_dataset.Dataset) – Training dataset.

  • valid_dataset (datasets.arrow_dataset.Dataset) – Valid dataset.

  • optimizer (torch.optimizer) – The optimizer, defaults to AdamW.

train()

Train. :returns: best valid score.

nn.Module: best valid model.

Return type:

float

class matchbench.trainer.base_trainer.TrainingContainer(model, args, optimizer=None)

Bases: object

The whole process of training. Class attributes:

  • model (nn.Module) – Model in every split

  • args (TrainingArguments) – Hyper parameters when training the specific model, List. For more details, please check the TrainingArguments class.

  • optimizer (torch.optimizer) – The optimizer, defaults to AdamW, List.

train(save_func=None)
class matchbench.trainer.base_trainer.TwoStageTrainer(model, args, train_dataset, valid_dataset, optimizer_1=None, optimizer_2=None)

Bases: Trainer

For some EA tasks, the training process is splitted into two parts. Class attributes (In addition to Trainer)

  • optimizer_2 (torch.optimizer) – The optimizer in the second step, defaults to Adam.

train()

Train. :returns: best valid score.

nn.Module: best valid model.

Return type:

float

class matchbench.trainer.base_trainer.WarmupTrainer(model, args, train_dataset, valid_dataset, policy, warmup)

Bases: Trainer

The whole process of warmup training.

Class attributes:
  • model (nn.Module) – The model to be trained.

  • args (TrainingArguments) – Hyper parameters when training. For more details, please check the TrainingArguments class.

  • train_dataset (datasets.arrow_dataset.Dataset) – Training dataset.

  • valid_dataset (datasets.arrow_dataset.Dataset) – Valid dataset.

  • policy (AugmentPolicyNetV4) – Augment policy model for Rotom.

  • warmup (bool) – If warm up the model by training it on labeled data or not.

auto_mixda(model, batch, policy=None, get_ind=False, no_ssl=False)

Perform one iteration of MixDA :param model: The model state. :type model: Rotom :param batch: The input batch. :type batch: tuple :param policy: The augmentation policy. :type policy: AugmentPolicyNetV4, optional

Returns:

The loss (of 0-d).

Return type:

Tensor

sharpen(logits, T=0.5)

Sharpen a label probability distribution (make closer to onehot) :param logits: the input probability :type logits: Tensor

Returns:

the sharpened tensor

Return type:

Tensor

train()

Train. :returns: best valid score.

nn.Module: best valid model.

Return type:

float

train_baseline_epoch()

Perfrom one epoch of the training process when warm up.

matchbench.trainer.training_arguments module

class matchbench.trainer.training_arguments.TrainingArgument(output_dir: str, overwrite_output_dir: bool = False, do_train: bool = False, do_eval: ~typing.Optional[bool] = None, do_predict: bool = False, evaluation_strategy: ~transformers.trainer_utils.IntervalStrategy = 'no', prediction_loss_only: bool = False, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, per_gpu_train_batch_size: ~typing.Optional[int] = None, per_gpu_eval_batch_size: ~typing.Optional[int] = None, gradient_accumulation_steps: int = 1, eval_accumulation_steps: ~typing.Optional[int] = None, learning_rate: float = 5e-05, weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, max_grad_norm: float = 1.0, num_train_epochs: float = 3.0, max_steps: int = -1, lr_scheduler_type: ~transformers.trainer_utils.SchedulerType = 'linear', warmup_ratio: float = 0.0, warmup_steps: int = 0, logging_dir: ~typing.Optional[str] = <factory>, logging_strategy: ~transformers.trainer_utils.IntervalStrategy = 'steps', logging_first_step: bool = False, logging_steps: int = 500, save_strategy: ~transformers.trainer_utils.IntervalStrategy = 'steps', save_steps: int = 500, save_total_limit: ~typing.Optional[int] = None, no_cuda: bool = False, seed: int = 42, fp16: bool = False, fp16_opt_level: str = 'O1', fp16_backend: str = 'auto', fp16_full_eval: bool = False, local_rank: int = -1, tpu_num_cores: ~typing.Optional[int] = None, tpu_metrics_debug: bool = False, debug: bool = False, dataloader_drop_last: bool = False, eval_steps: ~typing.Optional[int] = None, dataloader_num_workers: int = 0, past_index: int = -1, run_name: ~typing.Optional[str] = None, disable_tqdm: ~typing.Optional[bool] = None, remove_unused_columns: ~typing.Optional[bool] = True, label_names: ~typing.Optional[~typing.List[str]] = None, load_best_model_at_end: ~typing.Optional[bool] = False, metric_for_best_model: ~typing.Optional[str] = None, greater_is_better: ~typing.Optional[bool] = None, ignore_data_skip: bool = False, sharded_ddp: str = '', deepspeed: ~typing.Optional[str] = None, label_smoothing_factor: float = 0.0, adafactor: bool = False, group_by_length: bool = False, length_column_name: ~typing.Optional[str] = 'length', report_to: ~typing.Optional[~typing.List[str]] = None, ddp_find_unused_parameters: ~typing.Optional[bool] = None, dataloader_pin_memory: bool = True, skip_memory_metrics: bool = False, mp_parameters: str = '', device: ~torch.device = device(type='cuda'), train_batch_size: ~typing.Optional[int] = 32, test_batch_size: ~typing.Optional[int] = 128, train_batch_size_after_negsamp: ~typing.Optional[int] = 24, log_step: ~typing.Optional[int] = 10, eval_epoch: ~typing.Optional[int] = 1, clip_grad: ~typing.Optional[int] = 0, use_optimizer_grouped_parameters: ~typing.Optional[bool] = False, save_epochs: ~typing.Optional[int] = 1, save: bool = True, fp: ~typing.Optional[bool] = False, use_dm_optimizer: ~typing.Optional[bool] = False, lr_scheduler: ~typing.Optional[bool] = False, aug_in_batch: ~typing.Optional[bool] = False, mid_file_dir: ~typing.Optional[str] = 'middle_file/', dataset_name: str = 'dataset_name')

Bases: TrainingArguments

As the supplement of the transformers.TrainingArguments, storing all the training hyper parameters,.

aug_in_batch: Optional[bool] = False
clip_grad: Optional[int] = 0
dataset_name: str = 'dataset_name'
device: device = device(type='cuda')
eval_epoch: Optional[int] = 1
fp: Optional[bool] = False
log_step: Optional[int] = 10
lr_scheduler: Optional[bool] = False
mid_file_dir: Optional[str] = 'middle_file/'
save: bool = True
save_epochs: Optional[int] = 1
test_batch_size: Optional[int] = 128
train_batch_size: Optional[int] = 32
train_batch_size_after_negsamp: Optional[int] = 24
use_dm_optimizer: Optional[bool] = False
use_optimizer_grouped_parameters: Optional[bool] = False
class matchbench.trainer.training_arguments.TwoStageTrainingArgument(output_dir: str, overwrite_output_dir: bool = False, do_train: bool = False, do_eval: bool = None, do_predict: bool = False, evaluation_strategy: transformers.trainer_utils.IntervalStrategy = 'no', prediction_loss_only: bool = False, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, per_gpu_train_batch_size: Union[int, NoneType] = None, per_gpu_eval_batch_size: Union[int, NoneType] = None, gradient_accumulation_steps: int = 1, eval_accumulation_steps: Union[int, NoneType] = None, learning_rate: float = 5e-05, weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, max_grad_norm: float = 1.0, num_train_epochs: float = 3.0, max_steps: int = -1, lr_scheduler_type: transformers.trainer_utils.SchedulerType = 'linear', warmup_ratio: float = 0.0, warmup_steps: int = 0, logging_dir: Union[str, NoneType] = <factory>, logging_strategy: transformers.trainer_utils.IntervalStrategy = 'steps', logging_first_step: bool = False, logging_steps: int = 500, save_strategy: transformers.trainer_utils.IntervalStrategy = 'steps', save_steps: int = 500, save_total_limit: Union[int, NoneType] = None, no_cuda: bool = False, seed: int = 42, fp16: bool = False, fp16_opt_level: str = 'O1', fp16_backend: str = 'auto', fp16_full_eval: bool = False, local_rank: int = -1, tpu_num_cores: Union[int, NoneType] = None, tpu_metrics_debug: bool = False, debug: bool = False, dataloader_drop_last: bool = False, eval_steps: int = None, dataloader_num_workers: int = 0, past_index: int = -1, run_name: Union[str, NoneType] = None, disable_tqdm: Union[bool, NoneType] = None, remove_unused_columns: Union[bool, NoneType] = True, label_names: Union[List[str], NoneType] = None, load_best_model_at_end: Union[bool, NoneType] = False, metric_for_best_model: Union[str, NoneType] = None, greater_is_better: Union[bool, NoneType] = None, ignore_data_skip: bool = False, sharded_ddp: str = '', deepspeed: Union[str, NoneType] = None, label_smoothing_factor: float = 0.0, adafactor: bool = False, group_by_length: bool = False, length_column_name: Union[str, NoneType] = 'length', report_to: Union[List[str], NoneType] = None, ddp_find_unused_parameters: Union[bool, NoneType] = None, dataloader_pin_memory: bool = True, skip_memory_metrics: bool = False, mp_parameters: str = '', device: torch.device = device(type='cuda'), train_batch_size: Union[int, NoneType] = 32, test_batch_size: Union[int, NoneType] = 128, train_batch_size_after_negsamp: Union[int, NoneType] = 24, log_step: Union[int, NoneType] = 10, eval_epoch: Union[int, NoneType] = 1, clip_grad: Union[int, NoneType] = 0, use_optimizer_grouped_parameters: Union[bool, NoneType] = False, save_epochs: Union[int, NoneType] = 1, save: bool = True, fp: Union[bool, NoneType] = False, use_dm_optimizer: Union[bool, NoneType] = False, lr_scheduler: Union[bool, NoneType] = False, aug_in_batch: Union[bool, NoneType] = False, mid_file_dir: Union[str, NoneType] = 'middle_file/', dataset_name: str = 'dataset_name', learning_rate_2: Union[float, NoneType] = 0.0005, weight_decay_2: Union[float, NoneType] = 0.0005, first_stage_model_path: Union[str, NoneType] = None, num_train_epochs_2: Union[int, NoneType] = 5, eval_epoch_2: Union[int, NoneType] = 10, train_batch_size_2: Union[int, NoneType] = 128, test_batch_size_2: Union[int, NoneType] = 2048, train_batch_size_after_negsamp_2: Union[int, NoneType] = 256)

Bases: TrainingArgument

eval_epoch_2: Optional[int] = 10
first_stage_model_path: Optional[str] = None
learning_rate_2: Optional[float] = 0.0005
num_train_epochs_2: Optional[int] = 5
test_batch_size_2: Optional[int] = 2048
train_batch_size_2: Optional[int] = 128
train_batch_size_after_negsamp_2: Optional[int] = 256
weight_decay_2: Optional[float] = 0.0005

Module contents

class matchbench.trainer.TrainingArgument(output_dir: str, overwrite_output_dir: bool = False, do_train: bool = False, do_eval: ~typing.Optional[bool] = None, do_predict: bool = False, evaluation_strategy: ~transformers.trainer_utils.IntervalStrategy = 'no', prediction_loss_only: bool = False, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, per_gpu_train_batch_size: ~typing.Optional[int] = None, per_gpu_eval_batch_size: ~typing.Optional[int] = None, gradient_accumulation_steps: int = 1, eval_accumulation_steps: ~typing.Optional[int] = None, learning_rate: float = 5e-05, weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, max_grad_norm: float = 1.0, num_train_epochs: float = 3.0, max_steps: int = -1, lr_scheduler_type: ~transformers.trainer_utils.SchedulerType = 'linear', warmup_ratio: float = 0.0, warmup_steps: int = 0, logging_dir: ~typing.Optional[str] = <factory>, logging_strategy: ~transformers.trainer_utils.IntervalStrategy = 'steps', logging_first_step: bool = False, logging_steps: int = 500, save_strategy: ~transformers.trainer_utils.IntervalStrategy = 'steps', save_steps: int = 500, save_total_limit: ~typing.Optional[int] = None, no_cuda: bool = False, seed: int = 42, fp16: bool = False, fp16_opt_level: str = 'O1', fp16_backend: str = 'auto', fp16_full_eval: bool = False, local_rank: int = -1, tpu_num_cores: ~typing.Optional[int] = None, tpu_metrics_debug: bool = False, debug: bool = False, dataloader_drop_last: bool = False, eval_steps: ~typing.Optional[int] = None, dataloader_num_workers: int = 0, past_index: int = -1, run_name: ~typing.Optional[str] = None, disable_tqdm: ~typing.Optional[bool] = None, remove_unused_columns: ~typing.Optional[bool] = True, label_names: ~typing.Optional[~typing.List[str]] = None, load_best_model_at_end: ~typing.Optional[bool] = False, metric_for_best_model: ~typing.Optional[str] = None, greater_is_better: ~typing.Optional[bool] = None, ignore_data_skip: bool = False, sharded_ddp: str = '', deepspeed: ~typing.Optional[str] = None, label_smoothing_factor: float = 0.0, adafactor: bool = False, group_by_length: bool = False, length_column_name: ~typing.Optional[str] = 'length', report_to: ~typing.Optional[~typing.List[str]] = None, ddp_find_unused_parameters: ~typing.Optional[bool] = None, dataloader_pin_memory: bool = True, skip_memory_metrics: bool = False, mp_parameters: str = '', device: ~torch.device = device(type='cuda'), train_batch_size: ~typing.Optional[int] = 32, test_batch_size: ~typing.Optional[int] = 128, train_batch_size_after_negsamp: ~typing.Optional[int] = 24, log_step: ~typing.Optional[int] = 10, eval_epoch: ~typing.Optional[int] = 1, clip_grad: ~typing.Optional[int] = 0, use_optimizer_grouped_parameters: ~typing.Optional[bool] = False, save_epochs: ~typing.Optional[int] = 1, save: bool = True, fp: ~typing.Optional[bool] = False, use_dm_optimizer: ~typing.Optional[bool] = False, lr_scheduler: ~typing.Optional[bool] = False, aug_in_batch: ~typing.Optional[bool] = False, mid_file_dir: ~typing.Optional[str] = 'middle_file/', dataset_name: str = 'dataset_name')

Bases: TrainingArguments

As the supplement of the transformers.TrainingArguments, storing all the training hyper parameters,.

aug_in_batch: Optional[bool] = False
clip_grad: Optional[int] = 0
dataset_name: str = 'dataset_name'
device: device = device(type='cuda')
eval_epoch: Optional[int] = 1
fp: Optional[bool] = False
log_step: Optional[int] = 10
logging_dir: Optional[str]
lr_scheduler: Optional[bool] = False
mid_file_dir: Optional[str] = 'middle_file/'
output_dir: str
save: bool = True
save_epochs: Optional[int] = 1
test_batch_size: Optional[int] = 128
train_batch_size: Optional[int] = 32
train_batch_size_after_negsamp: Optional[int] = 24
use_dm_optimizer: Optional[bool] = False
use_optimizer_grouped_parameters: Optional[bool] = False