newsreclib.models.components.encoders.news package

Submodules

newsreclib.models.components.encoders.news.aspect module

class newsreclib.models.components.encoders.news.aspect.SentimentEncoder(num_sent_classes: int, sent_embed_dim: int, sent_output_dim: int)[source]

Bases: Module

Implements the sentiment encoder from SentiDebias.

Reference: Wu, Chuhan, Fangzhao Wu, Tao Qi, Wei-Qiang Zhang, Xing Xie, and Yongfeng Huang. “Removing AI’s sentiment manipulation of personalized news delivery.” Humanities and Social Sciences Communications 9, no. 1 (2022): 1-9.

For further details, please refer to the paper

num_sent_classes: Number of sentiment classes.

sent_embed_dim: Number of features in the sentiment embedding.

sent_output_dim: Number of output features in the linear layer (equivalent to the final dimensionality of the sentiment vector).

forward(sentiment) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

newsreclib.models.components.encoders.news.category module

class newsreclib.models.components.encoders.news.category.LinearEncoder(pretrained_embeddings: Optional[Tensor], from_pretrained: bool, freeze_pretrained_emb: bool, num_categories: int, embed_dim: Optional[int], use_dropout: bool, dropout_probability: Optional[float], linear_transform: bool, output_dim: Optional[int])[source]

Bases: Module

Implements a category encoder.

pretrained_embeddings: Matrix of pretrained embeddings.

from_pretrained: If True, it initializes the category embedding layer with pretrained embeddings. If False, it initializes the category embedding layer with random weights.

freeze_pretrained_emb: If True, it freezes the pretrained embeddings during training. If False, it updates the pretrained embeddings during training.

num_categories: Number of categories.

embed_dim: Number of features in the category vector.

use_dropout: Whether to use dropout after the embedding layer.

dropout_probability: Dropout probability.

linear_transform: Whether to linearly transform the category vector.

output_dim: Number of output features in the category encoder (equivalent to the final dimensionality of the category vector).

forward(category: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

newsreclib.models.components.encoders.news.news module

class newsreclib.models.components.encoders.news.news.KCNN(pretrained_text_embeddings: Tensor, pretrained_entity_embeddings: Tensor, pretrained_context_embeddings: Optional[Tensor], use_context: bool, text_embed_dim: int, entity_embed_dim: int, num_filters: int, window_sizes: List[int])[source]

Bases: Module

Implements the knowledge-aware CNN from DKN.

Reference: Wang, Hongwei, Fuzheng Zhang, Xing Xie, and Minyi Guo. “DKN: Deep knowledge-aware network for news recommendation.” In Proceedings of the 2018 world wide web conference, pp. 1835-1844. 2018.

For further details, please refer to the paper

pretrained_text_embeddings: Matrix of pretrained text embeddings.

pretrained_entity_embeddings: Matrix of pretrained entity embeddings.

pretrained_context_embeddings: Matrix of pretrained context embeddings.

use_context: Whether to use context embeddings.

text_embed_dim: The number of features in the text vector.

entity_embed_dim: The number of features in the entity vector.

num_filters: The number of filters in the CNN.

window_sizes: List of window sizes for the CNN.

forward(news: Dict[str, Tensor]) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class newsreclib.models.components.encoders.news.news.NewsEncoder(dataset_attributes: List[str], attributes2encode: List[str], concatenate_inputs: bool, text_encoder: Optional[Module], category_encoder: Optional[Module], entity_encoder: Optional[Module], combine_vectors: bool, combine_type: Optional[str], input_dim: Optional[int], query_dim: Optional[int], output_dim: Optional[int])[source]

Bases: Module

Implements a news encoder.

dataset_attributes: List of news features available in the used dataset.

attributes2encode: List of news features used as input to the news encoder.

concatenate_inputs: Whether the inputs (e.g., title and abstract) were concatenated into a single sequence.

text_encoder: The text encoder module.

category_encoder: The category encoder module.

entity_encoder: The entity encoder module.

combine_vectors: Whether to aggregate the representations of multiple news features.

combine_type: The type of aggregation to use for combining multiple news features representations. Choose between add_att (additive attention), linear, and concat (concatenate).

input_dim: The number of input features in the aggregation layer.

query_dim: The number of features in the query vector.

output_dim: The number of features in the final news vector.

forward(news: Dict[str, Tensor]) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

newsreclib.models.components.encoders.news.text module

class newsreclib.models.components.encoders.news.text.CNNAddAtt(pretrained_embeddings: Tensor, embed_dim: int, num_filters: int, window_size: int, query_dim: int, dropout_probability: float)[source]

Bases: Module

Implements a text encoder based on CNN and additive attention.

Reference: Wu, Chuhan, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. “Neural news recommendation with attentive multi-view learning.” arXiv preprint arXiv:1907.05576 (2019).

For further details, please refer to the paper

pretrained_embeddings: Matrix of pretrained embeddings.

embed_dim: The number of features in the text vector.

num_filters: The number of filters in the CNN.

window_size: The window size in the CNN.

query_dim: The number of features in the query vector.

dropout_probability: Dropout probability.

forward(text: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class newsreclib.models.components.encoders.news.text.CNNMHSAAddAtt(pretrained_embeddings: Tensor, embed_dim: int, num_filters: int, window_size: int, num_heads: int, query_dim: int, dropout_probability: float)[source]

Bases: Module

Implements a text encoder based on CNN, multi-head self-attention, and additive attention.

Reference: Qi, Tao, Fangzhao Wu, Chuhan Wu, Yongfeng Huang, and Xing Xie. “Privacy-Preserving News Recommendation Model Learning.” In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1423-1432. 2020.

For further details, please refer to the paper

pretrained_embeddings: Matrix of pretrained embeddings.

num_filters: The number of filters in the CNN.

window_size: The window size in the CNN.

num_heads: The number of heads in the MultiheadAttention.

query_dim: The number of features in the query vector.

dropout_probability: Dropout probability.

forward(text: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class newsreclib.models.components.encoders.news.text.CNNPersAtt(pretrained_embeddings: Tensor, text_embed_dim: int, user_embed_dim: int, num_filters: int, window_size: int, query_dim: int, dropout_probability: float)[source]

Bases: Module

Implements a text encoder based on CNN and Personalized Attention.

Reference: Wu, Chuhan, Fangzhao Wu, Mingxiao An, Jianqiang Huang, Yongfeng Huang, and Xing Xie. “NPA: neural news recommendation with personalized attention.” In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp. 2576-2584. 2019.

For further details, please refer to the paper

pretrained_embeddings: Matrix of pretrained embeddings.

text_embed_dim: The number of features in the text vector.

user_embed_dim: The number of features in the user vector.

num_filters: The number of filters in the CNN.

window_size: The window size in the CNN.

query_dim: The number of features in the query vector.

dropout_probability: Dropout probability.

forward(text: Tensor, lengths: Tensor, projected_users: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class newsreclib.models.components.encoders.news.text.MHSAAddAtt(pretrained_embeddings: Tensor, embed_dim: int, num_heads: int, query_dim: int, dropout_probability: float)[source]

Bases: Module

Implements a text encoder based on multi-head self-attention and additive attention.

Reference: Wu, Chuhan, Fangzhao Wu, Suyu Ge, Tao Qi, Yongfeng Huang, and Xing Xie. “Neural news recommendation with multi-head self-attention.” In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pp. 6389-6394. 2019.

For further details, please refer to the paper

pretrained_embeddings: Matrix of pretrained embeddings.

embed_dim: The number of features in the text vector.

num_heads: The number of heads in the MultiheadAttention.

query_dim: The number of features in the query vector.

dropout_probability: Dropout probability.

forward(text: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class newsreclib.models.components.encoders.news.text.PLM(plm_model, frozen_layers: Optional[List[int]], embed_dim: int, use_mhsa: bool, apply_reduce_dim: bool, reduced_embed_dim: Optional[int], num_heads: Optional[int], query_dim: Optional[int], dropout_probability: float)[source]

Bases: Module

Implements a text encoder based on a pretrained language model.

plm_model: Name of the pretrained language model.

frozen_layers: List of layers to freeze during training.

embed_dim: Number of features in the text vector.

use_mhsa: If True, it aggregates the token embeddings with a multi-head self-attention network into a final text representation. If False, it uses the CLS embedding as the final text representation.

apply_reduce_dim: Whether to linearly reduce the dimensionality of the news vector.

reduced_embed_dim: The number of features in the reduced news vector.

num_heads: The number of heads in the MultiheadAttention.

query_dim: The number of features in the query vector.

dropout_probability: Dropout probability.

forward(text: Tensor) → Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

newsreclib.models.components.encoders.news package

Submodules

newsreclib.models.components.encoders.news.aspect module

newsreclib.models.components.encoders.news.category module

newsreclib.models.components.encoders.news.news module

newsreclib.models.components.encoders.news.text module

Module contents