src package#
Subpackages#
- src.core package
- Subpackages
- Module contents
- src.preprocess package
Submodules#
src.configs module#
All the Config Parameters that are used in the lubrary with corresponding Data Type
- src.configs.configDictDType = {'alpha': <class 'float'>, 'batch_size': <class 'int'>, 'bleu_n': <class 'int'>, 'captions_file': <class 'str'>, 'clf_dim': <class 'list'>, 'context': <class 'int'>, 'd_ff': <class 'int'>, 'd_model': <class 'int'>, 'decoder_y_dim': <class 'int'>, 'device': <class 'str'>, 'dropout': <class 'float'>, 'embed_dim': <class 'int'>, 'encoder_h_dim': <class 'list'>, 'encoder_x_dim': <class 'list'>, 'epochs': <class 'int'>, 'eval_metric': <class 'str'>, 'explore_folder': <class 'bool'>, 'h_dim': <class 'list'>, 'idf_mode': <class 'str'>, 'image_backbone': <class 'str'>, 'image_dim': <class 'list'>, 'image_folder': <class 'str'>, 'input_file': <class 'str'>, 'input_folder': <class 'str'>, 'lr': <class 'float'>, 'mask': <class 'float'>, 'next': <class 'float'>, 'ngram': <class 'int'>, 'num_classes': <class 'int'>, 'num_extra_tokens': <class 'int'>, 'num_heads': <class 'int'>, 'num_layers': <class 'int'>, 'num_samples': <class 'int'>, 'num_sents_per_doc': <class 'int'>, 'num_src_vocab': <class 'int'>, 'num_tgt_vocab': <class 'int'>, 'num_topics': <class 'int'>, 'num_vocab': <class 'int'>, 'operations': <class 'list'>, 'output_folder': <class 'str'>, 'output_label': <class 'bool'>, 'predict_tokens': <class 'int'>, 'prediction': <class 'float'>, 'pretrain_weights': <class 'str'>, 'random': <class 'float'>, 'random_lines': <class 'bool'>, 'randomize': <class 'bool'>, 'rouge_n_n': <class 'int'>, 'rouge_s_n': <class 'int'>, 'seed': <class 'int'>, 'seq_len': <class 'int'>, 'test_corpus': <class 'list'>, 'test_file': <class 'str'>, 'test_folder': <class 'str'>, 'test_samples': <class 'int'>, 'test_size': <class 'float'>, 'test_split': <class 'float'>, 'tf_mode': <class 'str'>, 'train_corpus': <class 'list'>, 'train_samples': <class 'int'>, 'val_split': <class 'float'>, 'visualize': <class 'bool'>, 'x_dim': <class 'list'>, 'x_max': <class 'int'>}#
Config Parameters Segregated for each algorithm and it’s Parent Parameters.
- Format:
- ‘Algo’: {
}
src.main module#
src.metrics module#
- class src.metrics.ClassificationMetrics(config_dict)[source]#
Bases:
object
Metrics for Classification Task.
- Parameters:
config_dict (dict) – Config Params Dictionary
- get_metrics(references, predictions, target_names)[source]#
Function that returns Metrics using References, Predictions and Class Labels
- Parameters:
references (numpy.ndarray) – References, 1D array (num_samples,)
predictions (numpy.ndarray) – Predictions, 2D array (num_samples, num_classes) with Probabilities
target_names (list) – Class Labels
- Returns:
Metrics Dictionary
- Return type:
dict
- class src.metrics.TextGenerationMetrics(config_dict)[source]#
Bases:
object
Metrics for Text Generation Task.
- Parameters:
config_dict (dict) – Config Params Dictionary
- bleu_score(references, predictions, n=4)[source]#
BLEU Score
- Parameters:
references (numpy.ndarray) – References, 2D array (num_samples, seq_len)
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities
n (int, optional) – Max number of N gram, defaults to 4
- Returns:
BLEU score
- Return type:
float
- get_metrics(references, predictions)[source]#
Function that returns Metrics using References, Predictions and Class Labels
- Parameters:
references (numpy.ndarray) – References, 2D array (num_samples, seq_len)
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities
- Returns:
Metrics Dictionary
- Return type:
dict
- perplexity_score(predictions)[source]#
Perplixity Score
- Parameters:
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities
- Returns:
Perplixity Score
- Return type:
float
- rouge_l_score(references, predictions)[source]#
ROUGE L Score
- Parameters:
references (numpy.ndarray) – References, 2D array (num_samples, seq_len)
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities
- Returns:
ROUGE L score
- Return type:
float
- rouge_n_score(references, predictions, n=4)[source]#
ROUGE N Score
- Parameters:
references (numpy.ndarray) – References, 3D array (num_samples, seq_len)
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities
n (int, optional) – Max number of N gram, defaults to 4
- Returns:
ROUGE N score
- Return type:
float
- rouge_s_score(references, predictions, n=4)[source]#
ROUGE S Score
- Parameters:
references (numpy.ndarray) – References, 2D array (num_samples, seq_len)
predictions (numpy.ndarray) – Predictions, 3D array (num_samples, seq_len, num_vocab) with Probabilities
n (int, optional) – Max number of N gram, defaults to 4
- Returns:
ROUGE S score
- Return type:
float
src.plot_utils module#
- src.plot_utils.pca_emission_matrix(em_matrix_df, output_folder)[source]#
TSNE of Emission Matrix. Used in HMM
- Parameters:
em_matrix_df (pandas.DataFrame) – DataFrame of Emission Matrix
output_folder (str) – Path to saving Scatter plot as HTML file
- src.plot_utils.plot_conf_matrix(y_true, y_pred, classes, output_folder)[source]#
Confusion Matrix of True Labels vs Prediction Labels. Used in GRU, RNN
- Parameters:
y_true (list) – True Labels
y_pred (list) – Prediction labels
classes (list) – List of classes
output_folder (str) – Path to saving Confusion Matrix png file
- src.plot_utils.plot_embed(embeds, vocab, output_folder, fname='Word Embeddings TSNE')[source]#
3D TSNE of Word Embeddings from Embedding Matrix Layer
- Parameters:
embeds (numpy.ndarray) – Embeddings Array (num_samples, embed_dim)
vocab (list) – Vocabulary
output_folder (str) – Path to saving Scatter plot as HTML file
fname (str, optional) – Filename, defaults to “Word Embeddings TSNE”
- src.plot_utils.plot_hist_dataset(data, output_folder)[source]#
Plotting KDE plot of Sentence Length and Histogram of POS tag of each token. Used in HMM
- Parameters:
data (tuple of (numpy.ndarray [num_samples, seq_len], numpy.ndarray [num_samples, seq_len], numpy.ndarray [num_samples, ], numpy.ndarray [num_samples, ])) – Tuple of Train X, Test X, Train y, Test y
output_folder (str) – Path to saving Data Analysis png file
- src.plot_utils.plot_history(history, output_folder, name='History')[source]#
Training History with Loss, Metrics tracked during Training
- Parameters:
history (dict) – History Dictionary
output_folder (str) – Path to saving History png file
name (str, optional) – Filename, defaults to “History”
- src.plot_utils.plot_ngram_pie_chart(vocab_df, n, output_folder, k=20)[source]#
Pie Chart Top K frequent Ngrams. Used in NGRAM
- Parameters:
vocab_df (pandas.DataFrame) – DataFrame of Ngrams and their Frequency in Corpus
n (int) – Number of terms in a Vocab (N of Ngram)
output_folder (str) – Path to saving Pie Chart png file
k (int, optional) – Number of Ngrams to plot, defaults to 20
- src.plot_utils.plot_pca_pairplot(X, y, output_folder, num_pcs=6, name='TFIDF PCA Pairplot')[source]#
Pairplot of Features Colored by Labels. Used in TFIDF
- Parameters:
X (numpy.ndarray) – Feature 2D array (num_samples, num_features)
y (numpy.ndarray) – Labels array, (num_samples, )
output_folder (str) – Path to saving pairplot png file
num_pcs (int, optional) – Number of Features to Plot, defaults to 6
name (str, optional) – Filename, defaults to “TFIDF PCA Pairplot”
- src.plot_utils.plot_topk_cooccur_matrix(cooccur_mat, vocab, output_folder, k=20)[source]#
Coocurence Matrix of Tokens in GloVe Model. Used in GLOVE
- Parameters:
cooccur_mat (numpy.ndarray) – CoOccurence Matrix
vocab (list) – List of Vocabulary
output_folder (str) – Path to saving Coocurence matrix png file
k (int, optional) – Number of Vocab to plot, defaults to 20
- src.plot_utils.plot_topk_freq(vocab_freq, output_folder, k=10)[source]#
Histogram of Top K frequent Vocabulary in the corpus. Used in NGRAM, BOW
- Parameters:
vocab_freq (dict) – Vocabulary Frequency in the Corpus
output_folder (str) – Path to saving Histogram png file
k (int, optional) – Number of Vocab to plot, defaults to 10
- src.plot_utils.plot_transition_matrix(trans_matrix_df, output_folder)[source]#
Heatmap of Transmission Matrix. Used in HMM
- Parameters:
trans_matrix_df (pandas.DataFrame) – DataFrame of Transmission Matrix
output_folder (str) – Path to saving Heatmap png file
src.utils module#
- class src.utils.ValidateConfig(config_dict, algo)[source]#
Bases:
object
Validating Config File
- Parameters:
config_dict (dict) – Config Params Dictionary
algo (str) – Name of the Algorithm
- check_float(key, val)[source]#
To check whether given key whose value is float has a valid value or not
- Parameters:
key (str) – Param Key
val (float) – Param value
- check_int(key, val)[source]#
To check whether given key whose value is int has a valid value or not
- Parameters:
key (str) – Param Key
val (int) – Param value
- check_list(key, val)[source]#
To check whether given key whose value is list has a valid value or not
- Parameters:
key (str) – Param Key
val (list) – Param value
- check_paths(key, val)[source]#
To check whether given key whose value is a filepath has a valid value or not
- Parameters:
key (str) – Param Key
val (str) – Param value
- check_string(key, val)[source]#
To check whether given key whose value is str has a valid value or not
- Parameters:
key (str) – Param Key
val (str) – Param value
- compare_dtype(key, val)[source]#
To check whether given key whose value has a valid dtype or not
- Parameters:
key (str) – Param Key
val (float/int/str/list) – Param value
- src.utils.get_logger(log_folder)[source]#
Initializing Log File
- Parameters:
log_folder (str) – Path to folder where Log file is added