src package#

Subpackages#

Submodules#

src.configs module#

All the Config Parameters that are used in the lubrary with corresponding Data Type

src.configs.configDictDType = {'alpha': <class 'float'>, 'batch_size': <class 'int'>, 'bleu_n': <class 'int'>, 'captions_file': <class 'str'>, 'clf_dim': <class 'list'>, 'context': <class 'int'>, 'd_ff': <class 'int'>, 'd_model': <class 'int'>, 'decoder_y_dim': <class 'int'>, 'device': <class 'str'>, 'dropout': <class 'float'>, 'embed_dim': <class 'int'>, 'encoder_h_dim': <class 'list'>, 'encoder_x_dim': <class 'list'>, 'epochs': <class 'int'>, 'eval_metric': <class 'str'>, 'explore_folder': <class 'bool'>, 'h_dim': <class 'list'>, 'idf_mode': <class 'str'>, 'image_backbone': <class 'str'>, 'image_dim': <class 'list'>, 'image_folder': <class 'str'>, 'input_file': <class 'str'>, 'input_folder': <class 'str'>, 'lr': <class 'float'>, 'mask': <class 'float'>, 'next': <class 'float'>, 'ngram': <class 'int'>, 'num_classes': <class 'int'>, 'num_extra_tokens': <class 'int'>, 'num_heads': <class 'int'>, 'num_layers': <class 'int'>, 'num_samples': <class 'int'>, 'num_sents_per_doc': <class 'int'>, 'num_src_vocab': <class 'int'>, 'num_tgt_vocab': <class 'int'>, 'num_topics': <class 'int'>, 'num_vocab': <class 'int'>, 'operations': <class 'list'>, 'output_folder': <class 'str'>, 'output_label': <class 'bool'>, 'predict_tokens': <class 'int'>, 'prediction': <class 'float'>, 'pretrain_weights': <class 'str'>, 'random': <class 'float'>, 'random_lines': <class 'bool'>, 'randomize': <class 'bool'>, 'rouge_n_n': <class 'int'>, 'rouge_s_n': <class 'int'>, 'seed': <class 'int'>, 'seq_len': <class 'int'>, 'test_corpus': <class 'list'>, 'test_file': <class 'str'>, 'test_folder': <class 'str'>, 'test_samples': <class 'int'>, 'test_size': <class 'float'>, 'test_split': <class 'float'>, 'tf_mode': <class 'str'>, 'train_corpus': <class 'list'>, 'train_samples': <class 'int'>, 'val_split': <class 'float'>, 'visualize': <class 'bool'>, 'x_dim': <class 'list'>, 'x_max': <class 'int'>}#

Config Parameters Segregated for each algorithm and it’s Parent Parameters.

Format:

‘Algo’: {

}

src.main module#

src.metrics module#

class src.metrics.ClassificationMetrics(config_dict)[source]#

Bases: object

Metrics for Classification Task.

Parameters:: config_dict (dict) – Config Params Dictionary

get_metrics(references, predictions, target_names)[source]#

Function that returns Metrics using References, Predictions and Class Labels

Parameters:

references (numpy.ndarray) – References, 1D array (num_samples,)
predictions (numpy.ndarray) – Predictions, 2D array (num_samples, num_classes) with Probabilities
target_names (list) – Class Labels

Returns:

Metrics Dictionary

Return type:

dict

class src.metrics.TextGenerationMetrics(config_dict)[source]#

Bases: object

Metrics for Text Generation Task.

Parameters:: config_dict (dict) – Config Params Dictionary

bleu_score(references, predictions, n=4)[source]#

BLEU Score

Parameters:

references (numpy.ndarray) – References, 2D array (num_samples, seq_len)
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities
n (int, optional) – Max number of N gram, defaults to 4

Returns:

BLEU score

Return type:

float

cider_score(references, predictions)[source]#

get_metrics(references, predictions)[source]#

Function that returns Metrics using References, Predictions and Class Labels

Parameters:

references (numpy.ndarray) – References, 2D array (num_samples, seq_len)
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities

Returns:

Metrics Dictionary

Return type:

dict

meteor_score(references, predictions)[source]#

perplexity_score(predictions)[source]#

Perplixity Score

Parameters:: predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities
Returns:: Perplixity Score
Return type:: float

rouge_l_score(references, predictions)[source]#

ROUGE L Score

Parameters:

references (numpy.ndarray) – References, 2D array (num_samples, seq_len)
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities

Returns:

ROUGE L score

Return type:

float

rouge_n_score(references, predictions, n=4)[source]#

ROUGE N Score

Parameters:

references (numpy.ndarray) – References, 3D array (num_samples, seq_len)
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities
n (int, optional) – Max number of N gram, defaults to 4

Returns:

ROUGE N score

Return type:

float

rouge_s_score(references, predictions, n=4)[source]#

ROUGE S Score

Parameters:

references (numpy.ndarray) – References, 2D array (num_samples, seq_len)
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities
n (int, optional) – Max number of N gram, defaults to 4

Returns:

ROUGE S score

Return type:

float

src.plot_utils module#

src.plot_utils.pca_emission_matrix(em_matrix_df, output_folder)[source]#

TSNE of Emission Matrix. Used in HMM

Parameters:

em_matrix_df (pandas.DataFrame) – DataFrame of Emission Matrix
output_folder (str) – Path to saving Scatter plot as HTML file

src.plot_utils.plot_conf_matrix(y_true, y_pred, classes, output_folder)[source]#

Confusion Matrix of True Labels vs Prediction Labels. Used in GRU, RNN

Parameters:

y_true (list) – True Labels
y_pred (list) – Prediction labels
classes (list) – List of classes
output_folder (str) – Path to saving Confusion Matrix png file

src.plot_utils.plot_embed(embeds, vocab, output_folder, fname='Word Embeddings TSNE')[source]#

3D TSNE of Word Embeddings from Embedding Matrix Layer

Parameters:

embeds (numpy.ndarray) – Embeddings Array (num_samples, embed_dim)
vocab (list) – Vocabulary
output_folder (str) – Path to saving Scatter plot as HTML file
fname (str, optional) – Filename, defaults to “Word Embeddings TSNE”

src.plot_utils.plot_hist_dataset(data, output_folder)[source]#

Plotting KDE plot of Sentence Length and Histogram of POS tag of each token. Used in HMM

Parameters:

data (tuple of (numpy.ndarray [num_samples, seq_len], numpy.ndarray [num_samples, seq_len], numpy.ndarray [num_samples, ], numpy.ndarray [num_samples, ])) – Tuple of Train X, Test X, Train y, Test y
output_folder (str) – Path to saving Data Analysis png file

src.plot_utils.plot_history(history, output_folder, name='History')[source]#

Training History with Loss, Metrics tracked during Training

Parameters:

history (dict) – History Dictionary
output_folder (str) – Path to saving History png file
name (str, optional) – Filename, defaults to “History”

src.plot_utils.plot_ngram_pie_chart(vocab_df, n, output_folder, k=20)[source]#

Pie Chart Top K frequent Ngrams. Used in NGRAM

Parameters:

vocab_df (pandas.DataFrame) – DataFrame of Ngrams and their Frequency in Corpus
n (int) – Number of terms in a Vocab (N of Ngram)
output_folder (str) – Path to saving Pie Chart png file
k (int, optional) – Number of Ngrams to plot, defaults to 20

src.plot_utils.plot_pca_pairplot(X, y, output_folder, num_pcs=6, name='TFIDF PCA Pairplot')[source]#

Pairplot of Features Colored by Labels. Used in TFIDF

Parameters:

X (numpy.ndarray) – Feature 2D array (num_samples, num_features)
y (numpy.ndarray) – Labels array, (num_samples, )
output_folder (str) – Path to saving pairplot png file
num_pcs (int, optional) – Number of Features to Plot, defaults to 6
name (str, optional) – Filename, defaults to “TFIDF PCA Pairplot”

src.plot_utils.plot_topk_cooccur_matrix(cooccur_mat, vocab, output_folder, k=20)[source]#

Coocurence Matrix of Tokens in GloVe Model. Used in GLOVE

Parameters:

cooccur_mat (numpy.ndarray) – CoOccurence Matrix
vocab (list) – List of Vocabulary
output_folder (str) – Path to saving Coocurence matrix png file
k (int, optional) – Number of Vocab to plot, defaults to 20

src.plot_utils.plot_topk_freq(vocab_freq, output_folder, k=10)[source]#

Histogram of Top K frequent Vocabulary in the corpus. Used in NGRAM, BOW

Parameters:

vocab_freq (dict) – Vocabulary Frequency in the Corpus
output_folder (str) – Path to saving Histogram png file
k (int, optional) – Number of Vocab to plot, defaults to 10

src.plot_utils.plot_transition_matrix(trans_matrix_df, output_folder)[source]#

Heatmap of Transmission Matrix. Used in HMM

Parameters:

trans_matrix_df (pandas.DataFrame) – DataFrame of Transmission Matrix
output_folder (str) – Path to saving Heatmap png file

src.plot_utils.plot_wordcloud(vocab_freq, output_folder)[source]#

Generating Word Cloud Plot. Used in NGRAM, BOW

Parameters:

vocab_freq (dict) – Vocabulary Frequency in the Corpus
output_folder (str) – Path to saving Wordcloud png file

src.plot_utils.viz_metrics(metric_dict, output_folder)[source]#

Visualizing Confusion Matrix and Classification Report Metrics. Used in HMM

Parameters:

metric_dict (dict) – Metrics Dictionary with conf_matrix and clf_report as keys
output_folder (str) – Path to saving Metrics png file

src.utils module#

class src.utils.ValidateConfig(config_dict, algo)[source]#

Bases: object

Validating Config File

Parameters:

config_dict (dict) – Config Params Dictionary
algo (str) – Name of the Algorithm

check_float(key, val)[source]#

To check whether given key whose value is float has a valid value or not

Parameters:

key (str) – Param Key
val (float) – Param value

check_int(key, val)[source]#

To check whether given key whose value is int has a valid value or not

Parameters:

key (str) – Param Key
val (int) – Param value

check_list(key, val)[source]#

To check whether given key whose value is list has a valid value or not

Parameters:

key (str) – Param Key
val (list) – Param value

check_paths(key, val)[source]#

To check whether given key whose value is a filepath has a valid value or not

Parameters:

key (str) – Param Key
val (str) – Param value

check_string(key, val)[source]#

To check whether given key whose value is str has a valid value or not

Parameters:

key (str) – Param Key
val (str) – Param value

compare_dtype(key, val)[source]#

To check whether given key whose value has a valid dtype or not

Parameters:

key (str) – Param Key
val (float/int/str/list) – Param value

run_verify()[source]#: Config Params Keys and Values Verification

verify_main_keys(keys)[source]#

Verifying whether Config has all the required keys or not

Parameters:: keys (list) – Parent Config Parameters

verify_values()[source]#: Verifying the Datatypes of all the Parameters in Config

src.utils.get_logger(log_folder)[source]#

Initializing Log File

Parameters:: log_folder (str) – Path to folder where Log file is added

src.utils.load_config(config_path)[source]#

Loading YAML Config file as a Dictionary

Parameters:: config_path (str) – Path to Config File
Returns:: Config Params Dictionary
Return type:: dict

src.utils.set_seed(seed)[source]#

Setting seed across Libraries to reproduce results

Parameters:: seed (int) – Seed value

src package#

Subpackages#

Submodules#

src.configs module#

src.main module#

src.metrics module#

src.plot_utils module#

src.utils module#

Module contents#