src.core.word2vec package#
Submodules#
src.core.word2vec.dataset module#
- class src.core.word2vec.dataset.Word2VecDataset(config_dict)[source]#
Bases:
object
Word2Vec Dataset
- Parameters:
config_dict (dict) – Config Params Dictionary
- src.core.word2vec.dataset.create_dataloader(left_cxt, right_cxt, left_lbl, right_lbl, val_split=0.2, batch_size=32, seed=2024)[source]#
Creates Train, Validation left and Right DataLoader
- Parameters:
left_cxt (list) – Left context
right_cxt (list) – Right context
left_lbl (list) – Left label
right_lbl (list) – Right label
val_split (float) – validation split, defaults to 0.2
batch_size (int) – Batch size, defaults to 32
seed (int, optional) – Seed, defaults to 2024
- Returns:
train, val left and right dataloader
- Return type:
tuple (torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader)
src.core.word2vec.huffman module#
- class src.core.word2vec.huffman.HuffmanBTree(vocab_freq_dict)[source]#
Bases:
object
Creates Huffman Binary Tree to perform Softmax
- Parameters:
vocab_freq_dict (dict) – Vocabulary Frequency Dictionary
- class src.core.word2vec.huffman.Node(word_idx, freq, left=None, right=None)[source]#
Bases:
object
A class to initialize Node in a Huffman tree
- Parameters:
word_idx (int) – Word Id
freq (int) – Frequency of node
left (list, optional) – Left nodes, defaults to None
right (list, optional) – Right nodes, defaults to None
src.core.word2vec.model module#
- class src.core.word2vec.model.Word2VecModel(config_dict)[source]#
Bases:
Module
Word2Vec Model
- Parameters:
config_dict (dict) – Config Params Dictionary
- compute_cxt_embed(cxt)[source]#
Computes context embedding vector
- Parameters:
cxt (torch.Tensor (batch_size, context_len)) – Context vector
- Returns:
Label embedding
- Return type:
torch.Tensor (batch_size, embed_dim)
- forward(l_cxt, r_cxt, l_lbl, r_lbl)[source]#
Forward propogation
- Parameters:
l_cxt (torch.Tensor (batch_size,)) – Left context
r_cxt (torch.Tensor (batch_size,)) – Right context
l_lbl (torch.Tensor (batch_size,)) – Left label
r_lbl (torch.Tensor (batch_size,)) – Right label
- Returns:
Loss
- Return type:
torch.float32
- class src.core.word2vec.model.Word2VecTrainer(model, optimizer, config_dict)[source]#
Bases:
Module
Word2Vec Trainer
- Parameters:
model (torch.nn.Module) – Word2Vec model
optimizer (torch.optim) – Optimizer
config_dict (dict) – Config Params Dictionary
- fit(train_loader, val_loader)[source]#
Fits the model on dataset. Runs training and Validation steps for given epochs and saves best model based on the evaluation metric
- Parameters:
train_loader (torch.utils.data.DataLoader) – Train Data loader
val_loader (torch.utils.data.DataLoader) – Validaion Data Loader
- Returns:
Training History
- Return type:
dict
src.core.word2vec.word2vec module#
- class src.core.word2vec.word2vec.Word2Vec(config_dict)[source]#
Bases:
object
A class to run Word2Vec data preprocessing, training and inference
- Parameters:
config_dict (dict) – Config Params Dictionary