src.core.tfidf package#

Submodules#

src.core.tfidf.tfidf module#

class src.core.tfidf.tfidf.TFIDF(config_dict)[source]#

Bases: BOW

A class to run TFIDF data preprocessing, training and inference

Parameters:

config_dict (dict) – Config Params Dictionary

fit(text_ls=None, y=None)[source]#

Fits BOW algo on preprocessed text

Parameters:
  • text_ls (list, optional) – List of preprocessec strings, defaults to None

  • y (list, optional) – Labels, defaults to None

fit_transform(text_ls=None, y=None)[source]#

Fits and Transforms preprocessed text

Parameters:
  • text_ls (list, optional) – List of preprocessec strings, defaults to None

  • y (list, optional) – Labels, defaults to None

Returns:

Word vectors, Labels

Return type:

tuple (numpy.ndarray [num_samples, num_vocab], numpy.ndarray [num_samples, num_vocab])

get_idf(text_ls)[source]#

Creates Inverse Document Frequency array

Parameters:

text_ls (list) – List of preprocessec strings

Returns:

Inverse Document Frequency array

Return type:

numpy.ndarray (num_vocab,)

get_tf(text_ls)[source]#

Creates Term Frequency array

Parameters:

text_ls (list) – List of preprocessec strings

Returns:

Term Frequency array

Return type:

numpy.ndarray (num_vocab, num_samples)

run()[source]#

Runs TF-IDF Fit, Transform and saves output

save_output(X, y)[source]#

Saves Training and Inference results

Parameters:
  • X (numpy.ndarray (num_samples, num_vocab)) – Word vectors

  • y (list, optional) – Labels, defaults to None

transform(text_ls=None, y=None)[source]#

Transforms preprocessed text

Parameters:
  • text_ls (list, optional) – List of preprocessec strings, defaults to None

  • y (list, optional) – Labels, defaults to None

Returns:

Word vectors, Labels

Return type:

tuple (numpy.ndarray [num_samples, num_vocab], numpy.ndarray [num_samples, num_vocab])

Module contents#