Summary of the Datasets

NewsRecLib integrates, to date, 2 benchmark datasets: MIND and Adressa. Each is supported in two variants, depending on the dataset size.

MIND Dataset

NewsRecLib provides downloading, parsing, annotation, and loading functionalities for two variants of the MIND: MINDsmall and MINDlarge.

Reference: Wu, Fangzhao, Ying Qiao, Jiun-Hung Chen, Chuhan Wu, Tao Qi, Jianxun Lian, Danyang Liu et al. “Mind: A large-scale dataset for news recommendation.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 3597-3606. 2020.

For further details, please refer to the paper

Adreesa Dataset

NewsRecLib provides downloading, parsing, annotation, and loading functionalities for two variants of the Adressa: 1-week and 3-month.

Reference: Gulla, Jon Atle, Lemei Zhang, Peng Liu, Özlem Özgöbek, and Xiaomeng Su. “The adressa dataset for news recommendation.” In Proceedings of the international conference on web intelligence, pp. 1042-1048. 2017.

For further details, please refer to the paper