Quick Start

NewsRecLib’s entry point is the function train, which accepts a configuration file that drives the entire experiment.

Basic Configuration

The following example shows how to train a NRMS model on the MINDsmall dataset with the original configurations (i.e., news encoder contextualizing pretrained embeddings, model trained by optimizing cross-entropy loss), using an existing configuration file.

python newsreclib/train.py experiment=nrms_mindsmall_pretrainedemb_celoss_bertsent

In the basic experiment, the experiment configuration only specifies required hyperparameter values which are not set in the configurations of the corresponding modules.

defaults:
    - override /data: mind_rec_bert_sent.yaml
    - override /model: nrms.yaml
    - override /callbacks: default.yaml
    - override /logger: many_loggers.yaml
    - override /trainer: gpu.yaml
data:
    dataset_size: "small"
model:
    use_plm: False
    pretrained_embeddings_path: ${paths.data_dir}MINDsmall_train/transformed_word_embeddings.npy
    embed_dim: 300
    num_heads: 15

Advanced Configuration

The advanced scenario depicts a more complex experimental setting. Users cn overwrite from the main experiment configuration file any of the predefined module configurations. The following code snippet shows how to train a NRMS model with a PLM-based news encoder, and a supervised contrastive loss objective instead of the default settings.

python newsreclib/train.py experiment=nrms_mindsmall_plm_supconloss_bertsent

This is achieved by creating an experiment configuration file with the following specifications:

defaults:
    - override /data: mind_rec_bert_sent.yaml
    - override /model: nrms.yaml
    - override /callbacks: default.yaml
    - override /logger: many_loggers.yaml
    - override /trainer: gpu.yaml
data:
    dataset_size: "small"
    use_plm: True
    tokenizer_name: "roberta-base"
    tokenizer_use_fast: True
    tokenizer_max_len: 96
model:
    loss: "sup_con_loss"
    temperature: 0.1
    use_plm: True
    plm_model: "roberta-base"
    frozen_layers: [0, 1, 2, 3, 4, 5, 6, 7]
    pretrained_embeddings_path: None
    embed_dim: 768
    num_heads: 16

Alternatively, configurations can be overridden from the command line, as follows:

python newsreclib/train.py experiment=nrms_mindsmall_plm_supconloss_bertsent data.batch_size=128