matchbench.pair_gen package¶

Submodules¶

matchbench.pair_gen.blocker module¶

matchbench.pair_gen.data_augmenter module¶

class matchbench.pair_gen.data_augmenter.Augmenter¶

Bases: object

Data augmentation operator. Support both span and attribute level augmentation operators.

augment(tokens, labels, op='del')¶

Performs data augmentation on a sequence of tokens The supported ops:

[‘del’, ‘drop_col’,
‘append_col’, ‘drop_token’, ‘drop_len’, ‘drop_sym’, ‘drop_same’, ‘swap’, ‘ins’, ‘all’]

Parameters:

tokens (List of str) – the input tokens
labels (List of str) – the labels of the tokens
op (str, optional, defaults to “del”) – a string encoding of the operator to be applied

Returns:

the augmented tokens List of str: the augmented labels

Return type:

List of str

augment_sent(text, op='all')¶

Performs data augmentation on a classification example. Similar to augment(tokens, labels) but works for sentences or sentence-pairs. :param text: the input sentence :type text: str :param op: a string encoding of the operator to be applied :type op: str, optional, default to “all”

Returns:: the augmented sentence
Return type:: str

sample_position(tokens, labels, tfidf=False)¶

sample_span(tokens, labels, span_len=3)¶

matchbench.pair_gen.neg_sampler module¶

class matchbench.pair_gen.neg_sampler.NegativeSampler(neg_sample_batch_size=512, neg_num=2, nearest_sample_num=128, train_batch_size=24)¶