src.core.seq2seq package#

Submodules#

src.core.seq2seq.dataset module#

src.core.seq2seq.dataset.create_dataloader(X, y=None, val_split=0.2, batch_size=32, seed=2024, data='train')[source]#

Creates Train, Validation and Test DataLoader

Parameters:
  • X (torch.Tensor (num_samples, seq_len)) – Input source tokens

  • y (torch.Tensor (num_samples, seq_len), optional) – Input target tokens, defaults to None

  • val_split (float, optional) – validation split, defaults to 0.2

  • batch_size (int, optional) – Batch size, defaults to 32

  • seed (int, optional) – Seed, defaults to 2024

  • data (str, optional) – Type of data, defaults to “train”

Returns:

Train, Val / Test dataloaders

Return type:

tuple (torch.utils.data.DataLoader, torch.utils.data.DataLoader) / torch.utils.data.DataLoader

src.core.seq2seq.model module#

class src.core.seq2seq.model.DecoderLSTMCell(h_dim, inp_x_dim, out_x_dim)[source]#

Bases: Module

Decode LSTM cell

Parameters:
  • h_dim (int) – Hidden state vector dimension

  • inp_x_dim (int) – Input vector dimension

  • out_x_dim (int) – Output vector dimension

forward(ht_1, ct_1, xt)[source]#

Forward propogation

Parameters:
  • ht_1 (torch.Tensor (batch_size, h_dim)) – Hidden state vector

  • ct_1 (torch.Tensor (batch_size, h_dim)) – Cell stae vector

  • xt (torch.Tensor (batch_size, embed_dim)) – Input vector

Returns:

New hidden, cell states, output

Return type:

tuple (torch.Tensor [batch_size, h_dim], torch.Tensor [batch_size, h_dim], torch.Tensor [batch_size, out_dim])

class src.core.seq2seq.model.EncoderLSTMCell(h_dim, inp_x_dim)[source]#

Bases: Module

Encoder LSTM Cell

Parameters:
  • h_dim (int) – Hidden state vector dimension

  • inp_x_dim (int) – Input vector dimension

forward(ht_1, ct_1, xt)[source]#

Forward propogation

Parameters:
  • ht_1 (torch.Tensor (batch_size, h_dim)) – Hidden state vector

  • ct_1 (torch.Tensor (batch_size, h_dim)) – Cell stae vector

  • xt (torch.Tensor (batch_size, embed_dim)) – Input vector

Returns:

New hidden, cell states

Return type:

tuple (torch.Tensor [batch_size, h_dim], torch.Tensor [batch_size, h_dim])

class src.core.seq2seq.model.Seq2SeqAttention(config_dict)[source]#

Bases: Module

Seq2Seq Attention layer

Parameters:

config_dict (dict) – Config Params Dictionary

forward(si_1, yts)[source]#

Forward Propogation

Parameters:
  • si_1 (torch.Tensor (batch_size, 2*h_dim)) – Hidden state vector of decoder layer

  • yts (torch.tensor (batch_size, seq_len, out_dim)) – Encoder output vectors

Returns:

Attention weights, New hidden state vector

Return type:

tuple (torch.Tensor [batch_size, seq_len], torch.Tensor [batch_size, 2*h_dim])

class src.core.seq2seq.model.Seq2SeqDecoder(config_dict)[source]#

Bases: Module

Seq2Seq Decoder

Parameters:

config_dict (dict) – Config Params Dictionary

forward(encoder_yts, encoder_h, tgt=None)[source]#

Forward propogation

Parameters:
  • encoder_yts (torch.Tensor (batch_size, seq_len, out_dim)) – Encoder Output vectors

  • encoder_h (torch.Tensor (batch_size, seq_len, 2*h_dim)) – Encoder final hidden vectors

  • tgt (torch.Tensor (batch_size, seq_len), optional) – Target vectors, defaults to None

Returns:

Final predictions, Attention weights

Return type:

tuple (torch.Tensor [batch_size, seq_len, num_tgt_vocab], list)

class src.core.seq2seq.model.Seq2SeqEncoder(config_dict)[source]#

Bases: Module

Seq2Seq Encoder

Parameters:

config_dict (dict) – Config Params Dictionary

forward(src)[source]#

Forward propogation

Parameters:

src (torch.Tensor (batch_size, seq_len)) – Source tokens

Returns:

Predicted tokens, Hidden states

Return type:

tuple (torch.Tensor [batch_size, seq_len, out_dim], torch.Tensor [batch_size, 2*h_dim])

init_hidden()[source]#

Initialized hidden states

Returns:

List of hidden states

Return type:

list

class src.core.seq2seq.model.Seq2SeqModel(config_dict)[source]#

Bases: Module

Seq2Seq Model Architecture

Parameters:

config_dict (dict) – Config Params Dictionary

forward(src, tgt=None)[source]#

Forward propogation

Parameters:
  • src (torch.Tensor (batch_size, seq_len)) – Source tokens

  • tgt (torch.Tensor (batch_size, seq_len), optional) – _Target tokens, defaults to None

Returns:

Final predictions, Attention weights

Return type:

tuple (torch.Tensor [batch_size, seq_len, num_tgt_vocab], list)

class src.core.seq2seq.model.Seq2SeqTrainer(model, optimizer, config_dict)[source]#

Bases: Module

Seq2Seq Trainer

Parameters:
  • model (torch.nn.Module) – Seq2Seq model

  • optimizer (torch.optim) – Optimizer

  • config_dict (dict) – Config Params Dictionary

calc_loss(y_pred, y_true)[source]#

Crossentropy loss for predicted tokens

Parameters:
  • y_pred (torch.Tensor (batch_size, seq_len, num_vocab)) – Predicted tokens

  • y_true (torch.Tensor (batch_size, seq_len)) – True tokens

Returns:

BCE Loss

Return type:

torch.float32

fit(train_loader, val_loader)[source]#

Fits the model on dataset. Runs training and Validation steps for given epochs and saves best model based on the evaluation metric

Parameters:
  • train_loader (torch.utils.data.DataLoader) – Train Data loader

  • val_loader (torch.utils.data.DataLoader) – Validaion Data Loader

Returns:

Training History

Return type:

dict

predict(data_loader)[source]#

Runs inference to predict a translation of soruce sentence

Parameters:

data_loader (torch.utils.data.DataLoader) – Infer Data loader

Returns:

Predicted tokens

Return type:

numpy.ndarray (num_samples, seq_len, num_tgt_vocab)

train_one_epoch(data_loader, epoch)[source]#

Train step

Parameters:
  • data_loader (torch.utils.data.Dataloader) – Train Data Loader

  • epoch (int) – Epoch number

Returns:

Train Loss, Train Metrics

Return type:

tuple (torch.float32, dict)

val_one_epoch(data_loader)[source]#

Validation step

Parameters:

data_loader (torch.utils.data.Dataloader) – Validation Data Loader

Returns:

Validation Loss, Validation Metrics

Return type:

tuple (torch.float32, dict)

src.core.seq2seq.seq2seq module#

class src.core.seq2seq.seq2seq.Seq2Seq(config_dict)[source]#

Bases: object

A class to run Seq2Seq data preprocessing, training and inference

Parameters:

config_dict (dict) – Config Params Dictionary

run()[source]#

Runs Seq2Seq Training and saves output

Returns:

Training history

Return type:

dict

run_infer()[source]#

Runs inference

Returns:

True source, True target, predicted target tokens

Return type:

tuple (list, list, list)

save_output()[source]#

Saves Training and Inference results

Module contents#