matchbench.pair_gen package¶
Submodules¶
matchbench.pair_gen.blocker module¶
matchbench.pair_gen.data_augmenter module¶
- class matchbench.pair_gen.data_augmenter.Augmenter¶
Bases:
object
Data augmentation operator. Support both span and attribute level augmentation operators.
- augment(tokens, labels, op='del')¶
Performs data augmentation on a sequence of tokens The supported ops:
- [‘del’, ‘drop_col’,
‘append_col’, ‘drop_token’, ‘drop_len’, ‘drop_sym’, ‘drop_same’, ‘swap’, ‘ins’, ‘all’]
- Parameters:
tokens (List of str) – the input tokens
labels (List of str) – the labels of the tokens
op (str, optional, defaults to “del”) – a string encoding of the operator to be applied
- Returns:
the augmented tokens List of str: the augmented labels
- Return type:
List of str
- augment_sent(text, op='all')¶
Performs data augmentation on a classification example. Similar to augment(tokens, labels) but works for sentences or sentence-pairs. :param text: the input sentence :type text: str :param op: a string encoding of the operator to be applied :type op: str, optional, default to “all”
- Returns:
the augmented sentence
- Return type:
str
- sample_position(tokens, labels, tfidf=False)¶
- sample_span(tokens, labels, span_len=3)¶
matchbench.pair_gen.neg_sampler module¶
- class matchbench.pair_gen.neg_sampler.NegativeSampler(neg_sample_batch_size=512, neg_num=2, nearest_sample_num=128, train_batch_size=24)¶
Bases:
object
- generate_train_tups(model, train_ent1s, train_ent2s, idlist1, idlist2, all_emb1s, all_emb2s, train_emb1s, train_emb2s, device, shuffle=True)¶
- get_candidate_dict(train_ents, train_embs, all_ents, all_embs, device)¶
- get_dataloader(model, train_ent1s, train_ent2s, idlist1, idlist2, new_dataset, all_emb1s, all_emb2s, train_emb1s, train_emb2s, device=0)¶