matchbench.pair_gen package

Submodules

matchbench.pair_gen.blocker module

matchbench.pair_gen.data_augmenter module

class matchbench.pair_gen.data_augmenter.Augmenter

Bases: object

Data augmentation operator. Support both span and attribute level augmentation operators.

augment(tokens, labels, op='del')

Performs data augmentation on a sequence of tokens The supported ops:

[‘del’, ‘drop_col’,

‘append_col’, ‘drop_token’, ‘drop_len’, ‘drop_sym’, ‘drop_same’, ‘swap’, ‘ins’, ‘all’]

Parameters:
  • tokens (List of str) – the input tokens

  • labels (List of str) – the labels of the tokens

  • op (str, optional, defaults to “del”) – a string encoding of the operator to be applied

Returns:

the augmented tokens List of str: the augmented labels

Return type:

List of str

augment_sent(text, op='all')

Performs data augmentation on a classification example. Similar to augment(tokens, labels) but works for sentences or sentence-pairs. :param text: the input sentence :type text: str :param op: a string encoding of the operator to be applied :type op: str, optional, default to “all”

Returns:

the augmented sentence

Return type:

str

sample_position(tokens, labels, tfidf=False)
sample_span(tokens, labels, span_len=3)

matchbench.pair_gen.neg_sampler module

class matchbench.pair_gen.neg_sampler.NegativeSampler(neg_sample_batch_size=512, neg_num=2, nearest_sample_num=128, train_batch_size=24)

Bases: object

generate_train_tups(model, train_ent1s, train_ent2s, idlist1, idlist2, all_emb1s, all_emb2s, train_emb1s, train_emb2s, device, shuffle=True)
get_candidate_dict(train_ents, train_embs, all_ents, all_embs, device)
get_dataloader(model, train_ent1s, train_ent2s, idlist1, idlist2, new_dataset, all_emb1s, all_emb2s, train_emb1s, train_emb2s, device=0)

Module contents