src.core.tfidf package#
Submodules#
src.core.tfidf.tfidf module#
- class src.core.tfidf.tfidf.TFIDF(config_dict)[source]#
Bases:
BOW
A class to run TFIDF data preprocessing, training and inference
- Parameters:
config_dict (dict) – Config Params Dictionary
- fit(text_ls=None, y=None)[source]#
Fits BOW algo on preprocessed text
- Parameters:
text_ls (list, optional) – List of preprocessec strings, defaults to None
y (list, optional) – Labels, defaults to None
- fit_transform(text_ls=None, y=None)[source]#
Fits and Transforms preprocessed text
- Parameters:
text_ls (list, optional) – List of preprocessec strings, defaults to None
y (list, optional) – Labels, defaults to None
- Returns:
Word vectors, Labels
- Return type:
tuple (numpy.ndarray [num_samples, num_vocab], numpy.ndarray [num_samples, num_vocab])
- get_idf(text_ls)[source]#
Creates Inverse Document Frequency array
- Parameters:
text_ls (list) – List of preprocessec strings
- Returns:
Inverse Document Frequency array
- Return type:
numpy.ndarray (num_vocab,)
- get_tf(text_ls)[source]#
Creates Term Frequency array
- Parameters:
text_ls (list) – List of preprocessec strings
- Returns:
Term Frequency array
- Return type:
numpy.ndarray (num_vocab, num_samples)
- save_output(X, y)[source]#
Saves Training and Inference results
- Parameters:
X (numpy.ndarray (num_samples, num_vocab)) – Word vectors
y (list, optional) – Labels, defaults to None
- transform(text_ls=None, y=None)[source]#
Transforms preprocessed text
- Parameters:
text_ls (list, optional) – List of preprocessec strings, defaults to None
y (list, optional) – Labels, defaults to None
- Returns:
Word vectors, Labels
- Return type:
tuple (numpy.ndarray [num_samples, num_vocab], numpy.ndarray [num_samples, num_vocab])