src.core.transformer package#

Submodules#

src.core.transformer.dataset module#

class src.core.transformer.dataset.PreprocessTransformer(config_dict)[source]#

Bases: object

A class to preprocess Transformer Data

Parameters:

config_dict (dict) – Config Params Dictionary

batched_ids2tokens(tokens)[source]#

Converting sentence of ids to tokens

Parameters:

tokens (numpy.ndarray) – Tokens Array, 2D array (num_samples, seq_len)

Returns:

List of decoded sentences

Return type:

list

extract_data()[source]#

Extracts data from Lyrics csv file

Returns:

Lost of raw strings

Return type:

list

get_data()[source]#

Converts extracted data into tokens

Returns:

Text tokens

Return type:

numpy.ndarray (num_samples, seq_len)

get_vocab(text_ls)[source]#

Generates Vocabulary

Parameters:

text_ls (list) – List of preprocessed strings

Returns:

Corpus generated by WordPiece

Return type:

list

preprocess_text(text_ls)[source]#

Preprocesses list of strings

Parameters:

text_ls (list) – List of Raw strings

Returns:

List of preprocssed strings

Return type:

list

src.core.transformer.dataset.create_dataloader(X, val_split=0.2, test_split=0.2, batch_size=32, seed=2024)[source]#

Creates Train, Validation and Test DataLoader

Parameters:
  • X (torch.Tensor (num_samples, seq_len+1)) – Input tokens

  • val_split (float, optional) – validation split, defaults to 0.2

  • test_split (float, optional) – test split, defaults to 0.2

  • batch_size (int, optional) – Batch size, defaults to 32

  • seed (int, optional) – Seed, defaults to 2024

Returns:

Train, Val, Test dataloaders

Return type:

tuple (torch.utils.data.DataLoader, torch.utils.data.DataLoader, torch.utils.data.DataLoader)

src.core.transformer.model module#

class src.core.transformer.model.DecoderLayer(config_dict)[source]#

Bases: Module

Decoder layer

Parameters:

config_dict (dict) – Config Params Dictionary

forward(enc_output, tgt)[source]#

Forward Propogation

Parameters:
  • enc_output (torch.Tensor (batch_size, seq_len, d_model)) – Output of encoder layers

  • tgt (torch.Tensor (batch_size, seq_len, d_model)) – Target tokens

Returns:

Output of decoder

Return type:

torch.Tensor (batch_size, seq_len, d_model)

class src.core.transformer.model.EncoderLayer(config_dict)[source]#

Bases: Module

Encoder layer

Parameters:

config_dict (dict) – Config Params Dictionary

forward(src)[source]#

Forward propogation

Parameters:

src (torch.Tensor (batch_size, seq_len, d_model)) – Input tensor

Returns:

Output tensor

Return type:

torch.Tensor (batch_size, seq_len, d_model)

class src.core.transformer.model.FeedForward(config_dict)[source]#

Bases: Module

FeedForward Layer

Parameters:

config_dict (dict) – Config Params Dictionary

forward(x)[source]#

Forward Propogation

Parameters:

x (torch.Tensor (bathc_size, seq_len, d_model)) – Inptut tensor

Returns:

Output tensor

Return type:

torch.Tensor (bathc_size, seq_len, d_model)

class src.core.transformer.model.MultiHeadAttention(config_dict)[source]#

Bases: Module

Multi head Attention layer

Parameters:

config_dict (dict) – Config Params Dictionary

forward(Q, K, V, mask=False)[source]#

Forward propogation

Parameters:
  • Q (torch.Tensor (batch_size, seq_len, d_model)) – Query matrix

  • K (torch.Tensor (batch_size, seq_len, d_model)) – Key matrix

  • V (torch.Tensor (batch_size, seq_len, d_model)) – Value matrix

  • mask (bool, optional) – Whether to mask future tokens or not, defaults to False

Returns:

Attention output

Return type:

torch.Tensor (batch_size, seq_len, d_model)

class src.core.transformer.model.PositionalEncoding(config_dict)[source]#

Bases: Module

Positional Encoding

Parameters:

config_dict (dict) – Config Params Dictionary

forward(x)[source]#

Forward Propogation

Parameters:

x (torch.Tensor (bathc_size, seq_len, d_model)) – Text token embeddings

Returns:

Text token embeddings with positional encoding

Return type:

torch.Tensor (bathc_size, seq_len, d_model)

class src.core.transformer.model.TransformerModel(config_dict)[source]#

Bases: Module

Transformer architecture

Parameters:

config_dict (dict) – Config Params Dictionary

forward(src, tgt)[source]#

Forward propogation

Parameters:
  • src (torch.Tensor (batch_size, seq_len)) – Source tokens

  • tgt (torch.Tensor (batch_size, seq_len)) – Target tokens

Returns:

Predicted tokens

Return type:

torch.Tensor (batch_size, seq_len, num_vocab)

class src.core.transformer.model.TransformerTrainer(model, optimizer, config_dict)[source]#

Bases: Module

Transformer Trainer

Parameters:
  • model (torch.nn.Module) – Transformer model

  • optimizer (torch.optim) – Optimizer

  • config_dict (dict) – Config Params Dictionary

calc_loss(y_pred, y_true)[source]#

Crossentropy loss for predicted tokens

Parameters:
  • y_pred (torch.Tensor (batch_size, seq_len, num_vocab)) – Predicted tokens

  • y_true (torch.Tensor (batch_size, seq_len)) – True tokens

Returns:

BCE Loss

Return type:

torch.float32

fit(train_loader, val_loader)[source]#

Fits the model on dataset. Runs training and Validation steps for given epochs and saves best model based on the evaluation metric

Parameters:
  • train_loader (torch.utils.data.DataLoader) – Train Data loader

  • val_loader (torch.utils.data.DataLoader) – Validaion Data Loader

Returns:

Training History

Return type:

dict

predict(data_loader)[source]#

Runs inference to predict a shifted sentence

Parameters:

data_loader (torch.utils.data.DataLoader) – Infer Data loader

Returns:

True tokens, Predicted tokens

Return type:

tuple (numpy.ndarray [num_samples, seq_len], numpy.ndarray [num_samples, seq_len, num_vocab])

train_one_epoch(data_loader, epoch)[source]#

Train step

Parameters:
  • data_loader (torch.utils.data.Dataloader) – Train Data Loader

  • epoch (int) – Epoch number

Returns:

Train Losse, Train Metrics

Return type:

tuple (torch.float32, dict)

val_one_epoch(data_loader)[source]#

Validation step

Parameters:

data_loader (torch.utils.data.Dataloader) – Validation Data Loader

Returns:

Validation Losse, Validation Metrics

Return type:

tuple (torch.float32, dict)

src.core.transformer.transformer module#

class src.core.transformer.transformer.Transformer(config_dict)[source]#

Bases: object

A class to run Transformer data preprocessing, training and inference

Parameters:

config_dict (dict) – Config Params Dictionary

run()[source]#

Runs Transformer Training and saves output

run_infer()[source]#

Runs inference

Returns:

True and Predicted tokens

Return type:

tuple (list, list)

save_output()[source]#

Saves Training and Inference results

Module contents#