Content-Based Collaborative Generation for Recommender Systems
TL;DR Summary
This paper introduces ColaRec, a content-based collaborative generation model for recommenders, which uses a sequence-to-sequence framework to generate item identifiers directly, integrating user interaction data and collaborative signals effectively.
Abstract
Generative models have emerged as a promising utility to enhance recommender systems. It is essential to model both item content and user-item collaborative interactions in a unified generative framework for better recommendation. Although some existing large language model (LLM)-based methods contribute to fusing content information and collaborative signals, they fundamentally rely on textual language generation, which is not fully aligned with the recommendation task. How to integrate content knowledge and collaborative interaction signals in a generative framework tailored for item recommendation is still an open research challenge. In this paper, we propose content-based collaborative generation for recommender systems, namely ColaRec. ColaRec is a sequence-to-sequence framework which is tailored for directly generating the recommended item identifier. Precisely, the input sequence comprises data pertaining to the user's interacted items, and the output sequence represents the generative identifier (GID) for the suggested item. To model collaborative signals, the GIDs are constructed from a pretrained collaborative filtering model, and the user is represented as the content aggregation of interacted items. To this end, ColaRec captures both collaborative signals and content information in a unified framework. Then an item indexing task is proposed to conduct the alignment between the content-based semantic space and the interaction-based collaborative space. Besides, a contrastive loss is further introduced to ensure that items with similar collaborative GIDs have similar content representations. To verify the effectiveness of ColaRec, we conduct experiments on four benchmark datasets. Empirical results demonstrate the superior performance of ColaRec.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
The title of the paper is Content-Based Collaborative Generation for Recommender Systems. It clearly indicates that the paper focuses on combining item content information and user-item collaborative interactions within a generative framework for recommender systems.
1.2. Authors
The paper lists twelve authors: Yidan Wang, Zhaochun Ren, Zhixiang Liang, Xin Chen, Xu Zhang, Pengjie Ren, Jiyuan Yang, Weiwei Sun, Ruobing Xie, Su Yan, Zhumin Chen, and Xin Xin. Most authors are affiliated with Shandong University, with others from Leiden University, Zhejiang University, and Tencent (WeChat). This indicates a collaborative effort between academic institutions and an industry research lab.
1.3. Journal/Conference
The paper is published at the 33rd ACM International Conference on Information and Knowledge Management (CIKM '24). CIKM is a highly reputable and influential conference in the fields of information retrieval, knowledge management, and database systems, suggesting that the work has undergone rigorous peer review and is considered significant within these communities.
1.4. Publication Year
The paper was published at (UTC): 2024-03-27T11:49:58.000Z, as a preprint on arXiv. The ACM reference format indicates it is scheduled for the CIKM conference in October 2024.
1.5. Abstract
Generative models are a promising approach for enhancing recommender systems. The paper highlights the importance of integrating both item content and user-item collaborative interactions within a unified generative framework for improved recommendations. While existing large language model (LLM)-based methods combine content and collaborative signals, they primarily rely on textual language generation, which is not fully aligned with the direct item recommendation task.
To address this, the authors propose ColaRec, a sequence-to-sequence framework specifically designed for generating recommended item identifiers (GID). The input sequence consists of data related to a user's interacted items, and the output sequence is the GID of the suggested item. ColaRec models collaborative signals by constructing GIDs from a pretrained collaborative filtering model (e.g., LightGCN) and representing users as content aggregations of their interacted items. This unified approach captures both collaborative signals and content information.
To further align these two types of information, an item indexing task is introduced, which maps item side information (content and interacting users) into the item's GID. Additionally, a contrastive loss is incorporated to ensure that items with similar collaborative GIDs have similar content representations. Experiments on four benchmark datasets demonstrate ColaRec's superior performance compared to existing methods.
1.6. Original Source Link
The original source link is https://arxiv.org/abs/2403.18480v2. This is a preprint on arXiv.
The PDF link is https://arxiv.org/pdf/2403.18480v2.pdf.
2. Executive Summary
2.1. Background & Motivation
The core problem the paper aims to solve is the ineffective integration and alignment of item content information and user-item collaborative signals within generative recommender systems.
This problem is important because recommender systems are widely deployed to personalize information services, and their effectiveness heavily relies on accurately understanding user preferences and item characteristics. Traditional collaborative filtering (CF) methods primarily leverage user-item interaction data, while content-based methods focus on item attributes. Recent advancements in generative models, particularly large language models (LLMs), have shown promise in recommendation by transforming the task into language generation. However, these LLM-based approaches face inherent misalignments with the direct item recommendation task. Specifically, they often require a complex grounding stage to map generated text back to concrete items and struggle with directly generating target item IDs from a large candidate pool.
The paper identifies a gap in prior research where existing generative recommendation methods (which directly generate item identifiers, or GIDs) either prioritize item content (e.g., TIGER) or collaborative signals (e.g., Si et al.), but fail to effectively model both in a unified framework or to properly align them. Simple concatenation of content and collaborative IDs has been shown to be suboptimal, indicating a need for an explicit learning process for alignment.
The paper's innovative idea or entry point is to propose a sequence-to-sequence generative framework, ColaRec, tailored for directly generating item identifiers. This framework aims to unify collaborative signals (derived from user-item interactions) and item content information (textual descriptions) in an end-to-end manner, overcoming the limitations of LLM-based methods and existing generative recommendation approaches that only consider one aspect or lack proper alignment.
2.2. Main Contributions / Findings
The paper's primary contributions are:
-
Proposed
ColaRecFramework: Introduction of a novelgenerative recommendationframework,ColaRec, which utilizes anencoder-decodermodel to jointly captureitem content informationanduser-item collaborative signalsfor recommendation. This represents a unified approach that is specifically tailored for the recommendation task, addressing the misalignment issues ofLLM-basedmethods. -
Auxiliary Tasks for Alignment: Development of an
auxiliary item indexing taskand acontrastive lossto facilitate better alignment betweenitem content informationanduser-item collaborative signals. This explicit learning process for alignment is crucial for enhancing the performance ofgenerative recommendation. -
Empirical Validation: Extensive experiments conducted on four benchmark datasets demonstrate the
superior recommendation performanceofColaReccompared to state-of-the-art baselines. The results showColaRec's effectiveness and generalization across different domains, particularly its significant improvement forlong-tail userswith sparse interactions.The key conclusions or findings reached by the paper are:
- Unifying
content informationandcollaborative signalswithin a tailored generative framework significantly improves recommendation performance. - Explicit alignment mechanisms, such as the
item indexing taskandcontrastive loss, are essential for effectively integrating these two types of signals. - The proposed
GIDconstruction strategy, which leverages a pretrainedcollaborative filteringmodel, is highly effective, outperformingGIDsbased solely on random strings or item content. ColaRecis particularly beneficial forlong-tail users, indicating its robustness in sparse data scenarios. These findings collectively address the challenge of integrating diverse information sources ingenerative recommender systems, leading to more accurate and robust recommendations.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
-
Recommender Systems: A class of information filtering systems that seek to predict the "rating" or "preference" a user would give to an item. They are used to suggest items (e.g., movies, products, news articles) to users that they might like. The goal is to enhance user experience and engagement by providing personalized suggestions.
-
Collaborative Filtering (CF): A widely used technique in recommender systems that makes recommendations based on the preferences of similar users or the characteristics of similar items.
- User-based CF: Recommends items to a user that similar users have liked.
- Item-based CF: Recommends items that are similar to items the user has liked in the past.
- The core idea is that if user A and user B have similar tastes, and user A liked item X, then user B is likely to like item X too.
Collaborative signalsrefer to the patterns and knowledge derived from these user-item interactions.
-
Content Information: Refers to the descriptive attributes of items, such as textual descriptions (e.g., titles, genres, brands, tags), images, videos, or other metadata.
Content-based recommendationsystems recommend items similar to those a user has liked in the past based on their attributes, rather than relying solely on interactions from other users. -
Generative Models: A type of artificial intelligence model that can learn the patterns and structures of input data and then generate new, similar data. Examples include
Generative Adversarial Networks (GANs),Variational Autoencoders (VAEs),Diffusion Models, andLarge Language Models (LLMs). In the context of recommendation, generative models can be used to generate item attributes, explanations, or even the item identifiers themselves. -
Sequence-to-Sequence (Seq2Seq) Models: A neural network architecture that transforms an input sequence into an output sequence. It typically consists of an
encoderthat processes the input sequence and adecoderthat generates the output sequence. Widely used in machine translation, text summarization, and, in this paper, for generating item identifiers. -
Transformer Architecture: A neural network architecture introduced in the paper "Attention Is All You Need" (Vaswani et al., 2017). It relies heavily on
self-attention mechanismsto weigh the importance of different parts of the input sequence when processing each element. Transformers have become the backbone of manyLLMsandSeq2Seqmodels due to their ability to handle long-range dependencies and parallelize computation efficiently. -
Graph Neural Networks (GNNs): Neural networks designed to operate on graph-structured data. They learn representations (embeddings) of nodes and edges by aggregating information from their local neighborhoods. In recommender systems, user-item interactions can be naturally represented as a bipartite graph, making
GNNssuitable for learning user and item embeddings. -
K-means Clustering: An unsupervised machine learning algorithm used to partition observations into clusters, where each observation belongs to the cluster with the nearest mean (centroid).
Hierarchical K-meansappliesK-meansiteratively to create a tree-like structure of clusters. In this paper, it's used to constructGenerative Identifiers (GIDs). -
Contrastive Learning: A machine learning paradigm where the model learns representations by pushing "similar" (positive) samples closer together in the embedding space and "dissimilar" (negative) samples further apart. It's used to learn robust and discriminative representations without explicit labels.
-
Bayesian Personalized Ranking (BPR) Loss: A pairwise ranking loss function commonly used in recommender systems. It optimizes the model to rank observed (positive) items higher than unobserved (negative) items for a given user. $ \mathcal{L}{\mathrm{BPR}} = -\sum{(u, i, j) \in D_S} \log \sigma(\hat{x}{ui} - \hat{x}{uj}) $ where is the training set of triplets
(u, i, j)where user prefers item over item , is the predicted score of item for user , is the predicted score of item for user , and is the sigmoid function. The goal is to maximize the difference between the positive and negative item scores. -
Generative Identifier (GID): A unique sequence of tokens assigned to each item, designed to be generated by a
generative recommendationmodel. Unlike a singleitemID, aGIDis a structured sequence that can encode more information and correlations.
3.2. Previous Works
- Matrix Factorization (MF) [26, 42]: Early
CFapproaches that decompose the sparse user-item interaction matrix into lower-dimensional latent user and item factor matrices.NeuMF[18] is a deep learning extension. - Graph Convolutional Matrix Completion [2]: Early work using
GNNsforCF. - Neural Graph Collaborative Filtering (NGCF) [51]: A prominent
GNN-basedCFmodel that explicitly encodes high-order connectivity in the user-item interaction graph. - LightGCN [16]: A simplified
GNN-basedCFmodel that removes non-linear activation functions and feature transformations fromNGCF, focusing solely on neighborhood aggregation for learning user and item embeddings. It's known for its effectiveness and simplicity. - SimpleX [36]: A simple yet strong
CFbaseline using a cosine-basedcontrastive lossand negative sampling. - NCL [34]: Improves
LightGCNby incorporatingcontrastive learningto further enhance item and user representations. - Variational Autoencoders (VAEs) for CF (
MultiVAE) [31, 43]: ApplyVAEsto model user-item interactions, treating user preferences as latent variables and reconstructing observed interactions. - Diffusion Models for Recommendation (
DiffRec) [30, 50]: New approaches that leveragediffusion modelsto learn user-item interaction knowledge through reconstruction and denoising processes. - LLM-based Recommendation [10, 29, 32, 33, 35, 60, 61, 63, 65]: Reformulate recommendation as a language generation task, using
LLMsto generate natural language responses (e.g., item descriptions, explanations) based on user prompts and historical interactions.P5[12]: A unifiedPretrain, Personalized Prompt & Predict Paradigmthat treats recommendation tasks aslanguage processing.LC-Rec[63]: Performs recommendation through variouslanguage generation tasksusingLLMs.- Core Challenge: These methods often suffer from
task misalignment, requiringgrounding stagesto map generated text to actual items and struggling to generate specificitemIDsfrom large pools.
- Generative Retrieval (
DSI) [3, 45, 47, 52]: A paradigm wheregenerative modelsdirectly generate identifiers (e.g., document IDs) for retrieval.DSI[47]:Differentiable Search Index, a pioneering work ingenerative retrievalthat uses aSeq2Seqmodel to directly generate document identifiers.
- Generative Recommendation (tailored for item IDs) [22, 28, 37, 40, 44, 46, 49, 53, 58]: A new paradigm inspired by
generative retrievalbut tailored for recommendation. Items are assignedGenerative Identifiers (GIDs), which are sequences of tokens, and aSeq2Seqmodel directly generates theGIDof the recommended item.TIGER[40]: Uses anRQ-VAEto constructGIDsfrom item textual content embeddings and then aTransformerto generate sequential recommendations.Si et al.[44]: ConstructsGIDsfrom item embeddings of a pretrainedSASRecmodel (a sequential recommender).Hua et al.[22]: Investigated item identifier construction and combined content-based semantic strings with collaborative IDs (from co-occurrence matrix) via naive concatenation.
3.3. Technological Evolution
Recommender systems have evolved from traditional collaborative filtering (e.g., Matrix Factorization) to deep learning-based approaches (e.g., NeuMF), then to Graph Neural Networks (GNNs) that better capture complex interaction patterns (NGCF, LightGCN, NCL). Concurrently, the rise of generative models has spurred new directions. Initial attempts involved VAEs and GANs for implicit feedback. More recently, Large Language Models (LLMs) have been adapted, treating recommendation as a language generation task. This LLM-based approach, while powerful, often faces task misalignment.
A new wave, generative recommendation, inspired by generative retrieval, aims to overcome this misalignment by directly generating item identifiers (GIDs) rather than natural language. Early generative recommendation methods, like TIGER and Si et al., focused on either content or collaborative signals in GID construction. However, a key challenge remained: how to jointly model and effectively align both item content information and user-item collaborative signals within this end-to-end generative framework. This paper (ColaRec) fits into this evolution by proposing a comprehensive solution to this specific alignment challenge within the generative recommendation paradigm.
3.4. Differentiation Analysis
Compared to the main methods in related work, ColaRec offers several core differences and innovations:
-
LLM-based Recommendation:
- Differentiation:
ColaRecdoes not rely ontextual language generationfor recommendation. Instead, it directly generatesGenerative Identifiers (GIDs)that map to specific items. This fundamentally avoids thetask misalignmentandgrounding stageissues inherent inLLM-basedmethods (e.g.,P5,LC-Rec), which struggle with generating concreteitemIDsfrom a large candidate pool. - Innovation:
ColaRecis specificallytailored for item recommendation, offering a more direct and efficient approach than repurposingLLMsfor a potentially ill-fitting task.
- Differentiation:
-
Existing Generative Recommendation Methods (e.g.,
TIGER,Si et al.,Hua et al.):- Differentiation:
ColaRecexplicitly and jointly models bothitem content informationanduser-item collaborative signalsin a unifiedsequence-to-sequenceframework.TIGERprimarily constructsGIDsfrom item textual content.Si et al.constructsGIDsfromcollaborative filteringembeddings (e.g.,SASRec) but the framework doesn't explicitly model content.Hua et al.naively concatenates content and collaborative IDs without a proper learning process for alignment.
- Innovation:
ColaRecintroduces a novelGID constructionstrategy based onpretrained collaborative filtering(LightGCN) to embedcollaborative signalsdirectly into theGIDs. Crucially, it then proposes anauxiliary item indexing taskand acontrastive lossto explicitly align thecontent-based semantic spaceand theinteraction-based collaborative space. This dual focus on unified modeling and explicit alignment is a key innovation.
- Differentiation:
-
Conventional CF-based Methods (e.g.,
LightGCN,NCL):-
Differentiation:
ColaRecis agenerative modelthat predicts items by generating theirGIDs, whereasCF-basedmethods typically learn item embeddings and perform ranking by calculating similarity scores. WhileCFmodels can incorporate content (e.g.,NCL), they are not inherently generative in the sense of producing structured identifiers. -
Innovation: By using a generative approach,
ColaRecoffers anend-to-endparadigm that can potentially capture more complex relationships and facilitate better interpretability ifGIDsare designed meaningfully, while still leveraging the strengths ofCF.In summary,
ColaRecinnovates by creating agenerative recommendationframework that effectively unifiescontentandcollaborative signalsthrough a tailoredGIDconstruction, and critically, introduces explicit alignment mechanisms viamulti-task learning(indexing task and contrastive loss), which is largely missing or insufficient in previous generative approaches.
-
4. Methodology
This section details the ColaRec framework, which aims to integrate item content information and user-item collaborative signals into a sequence-to-sequence generation model for direct item recommendation.
4.1. Notations
- : A specific user.
- : A specific item.
- : The set of items that user has interacted with.
- : The set of users who have interacted with item .
- : The content description of item .
- : The user's atomic identifier, a randomly assigned single token for user .
- : The item's atomic identifier, a randomly assigned single token for item .
- : The generative identifier for item , a sequence of tokens.
- : The length of the
GID.
4.2. Generative Recommendation Task Formulation
The goal of generative recommendation is to predict a list of items for a user by generating their Generative Identifiers (GIDs), given information about the user's previously interacted items . This generation process is auto-regressive, meaning each token in the GID is generated based on the previously generated tokens and the input information.
The probability of recommending item for user is estimated as the product of the probabilities of generating each token in its GID, sequentially:
$
p ( u , i ) = \prod _ { t = 1 } ^ { l } { p ( z _ { i } ^ { t } | \mathcal { I } _ { u } ^ { + } , z _ { i } ^ { 1 } , z _ { i } ^ { 2 } , \cdot \cdot \cdot , z _ { i } ^ { t - 1 } ) }
$
where is the -th token of , and are the tokens generated before step . The recommender then selects items with the top- highest p(u,i) scores as the recommendation list for user .
4.3. Overview of ColaRec
The overall architecture of ColaRec is illustrated in Figure 3. It's built around a sequence-to-sequence encoder-decoder Transformer model, typically a pretrained T5 model.

该图像是示意图,展示了内容基础协作生成推荐系统的框架 ColaRec。左侧展示了用户与项目的推荐过程,包括用户交互的项目和对应的文本内容。右侧则涵盖了项目索引的流程,使用图神经网络(GNN)进行协作过滤。图中标出了多个损失函数,如推荐损失 和对比损失 ,并说明了生成项目标识符(GID)的过程。整体结构体现了信息流动与模型架构。
ure:OverviColaReolaReasss acitem wi GbtaifromN-base delolaRecs twtasks. Use-ItemRecenati ims aphe user' teacs wi xtualcontent tohe G e recommended item, i.e., Item-Item Indexing targets on the mapping from item side information into the item's GID, i.e., Besides, a ranking loss and a contrastive loss are also introduced.
ColaRec consists of two main training tasks:
-
User-Item Recommendation Task: This is the primary task, where the model takes the content information of a user's historically interacted items as input and generates the
GIDof a recommended item. This task is optimized using arecommendation loss. -
Item-Item Indexing Task: An auxiliary task designed to align
content informationandcollaborative signals. It maps item side information (including its textual content and interacting users) into the item'sGID. This task is optimized using anindexing loss.Additionally,
ColaRecincorporates two other loss functions:
-
A
ranking loss(Bayesian Personalized Ranking) to enhance the model's ability to discriminate between positive and negative items. -
A
contrastive lossto further ensure that items with similar collaborativeGIDshave similar content representations, strengthening the alignment.All these tasks are learned jointly, sharing the same
encoder-decodermodel. TheGIDsthemselves are constructed using agraph-based collaborative filtering (CF) model(LightGCN) to embedcollaborative signals.
4.4. Generative Identifier (GID) Construction
The construction of GIDs is critical. Ideal GIDs should:
-
Contain both
collaborative signalsandcontent information. -
Reflect correlations: similar items (content-wise or user-wise) should have correlated
GIDs. -
Be unique for each item and unambiguously map back to that item.
ColaRecconstructsGIDsusing a hierarchical clustering approach based on item representations from apretrained LightGCN model. -
Item Representation Extraction: Item embeddings are first obtained from a
LightGCNmodel, which has been pretrained on the user-item interaction graph. SinceLightGCNlearns representations by aggregating information from the interaction graph, these embeddings naturally encodeuser-item collaborative signals. -
Hierarchical Clustering: A
constrained K-means algorithmis applied hierarchically to theseLightGCNitem representations.- This process forms a -ary tree structure where each item corresponds to a leaf node.
- The path from the root node to an item's leaf node constitutes the item's
GID. - For a
GIDof length , the clustering is performed forl-1levels. For the -th level (), the number of items in each cluster is constrained to be no more than . - At the final (leaf) level, a random value from 1 to is allocated to each item within its respective leaf cluster.
-
Codebook Embeddings: For each position in the
GID(i.e., each level of the hierarchy), there is a correspondingcodebook embedding matrix. Thesecodebook embeddingsare learned during the training ofColaRecand help incorporatecontent informationalongside thecollaborative signalsalready present in theGIDstructure.This design ensures that
GIDsinherently capturecollaborative signalsthrough theLightGCNembeddings and structure, whilecontent informationis integrated during theColaRectraining process via thecodebook embeddingsand theitem indexing task.
4.5. User-Item Recommendation
This task models user preferences by generating the GID of a recommended item based on the user's historical interactions.
4.5.1. Model Inputs
The input sequence for user is an unordered aggregation of content tuples from the items has interacted with. This design directly reflects the CF principle that user preferences can be inferred from their interacted items.
- Item Content Tuple : For each item , its textual description is flattened into a sequence of key-value attribute pairs (e.g.,
[k1:v1, k2:v2, ...]). To further improve fidelity, the item's atomic identifier is also included in this tuple. $ c _ { i } = \left[ \mathrm { iad } _ { i } , k _ { 1 } { : } v _ { 1 } , k _ { 2 } { : } v _ { 2 } , \ldots \right] $ - User Input Sequence : The aggregation of these content tuples for all items in . A special
task tokenis prepended to inform the model that it's performing a recommendation task. $ X _ { u } = [ \mathrm { task } _ { u } , { c _ { i } | i \in \mathcal { I } _ { u } ^ { + } } ] $
4.5.2. Item Generation
A Transformer encoder-decoder model (a pretrained T5 model) is used for generation.
- Encoding: The
encoderprocesses the input sequence to capture its semantic information, producing a hidden state . - Decoding: The
decoderthenauto-regressivelygenerates theGIDtokens. At each step , given the encoder's output and the previously generated tokens (i.e., ), the decoder produces a latent representation for the current token: $ \mathbf { d } _ { t } = \operatorname { Decoder } ( \operatorname { Encoder } ( X _ { u } ) , z ^ { < t } ) $ Here, where is the dimension of the latent representation. - Token Probability: The probability of generating the -th
GIDtoken is calculated by comparing with thecodebook embedding matrixspecific to thatGIDposition (or level in the hierarchy): $ p ( z ^ { t } | z ^ { < t } , X _ { u } ) = \mathrm { softmax } ( \mathbf { d } _ { t } \cdot \mathbf { E } _ { t } ^ { \top } ) $ is thecodebook embedding matrixfor the -thGIDposition, containing embeddings for all possible tokens at that position.
4.5.3. Recommendation Loss
The model is optimized using a cross-entropy loss to minimize the negative log-likelihood of generating the correct GID tokens for observed positive user-item pairs (u, i):
$
\mathcal { L } _ { \mathrm { rec } } = - \sum _ { t = 1 } ^ { l } \log \hat { p } ( z _ { i } ^ { t } | X _ { u } , z _ { i } ^ { 1 } , z _ { i } ^ { 2 } , \cdot \cdot \cdot , z _ { i } ^ { t - 1 } )
$
where represents the predicted probability from the model. The parameters of the pretrained T5 model are fine-tuned during this process.
4.6. Item-Item Indexing
This auxiliary task is crucial for aligning collaborative signals (embedded in the GIDs) with item content information (from textual descriptions). It trains the model to map an item's comprehensive side information to its GID.
4.6.1. Model Inputs
The input sequence for item combines its textual content and the atomic identifiers of users who have interacted with it.
- Item Input Sequence : A special
task tokenis prepended to indicate the indexing task, followed by the item's content tuple and theatomic identifiersof users in : $ X _ { i } = [ \mathrm { task } _ { i } , c _ { i } , { \mathrm { uad } _ { u } | u \in \mathcal { U } _ { i } ^ { + } } ] $ This input explicitly includes bothitem content() andcollaborative signals( for interacting users).
4.6.2. Item Indexing Loss
The indexing task uses the same encoder-decoder model and codebook embeddings as the recommendation task. The generation probabilities are calculated identically to Eq. (4) and Eq. (5), but with as the input instead of . The loss for item indexing is also a cross-entropy loss:
$
\mathcal { L } _ { \mathrm { index } } = - \sum _ { t = 1 } ^ { l } \log p ( z _ { i } ^ { t } | X _ _ { i } , z _ { i } ^ { 1 } , z _ { i } ^ { 2 } , \cdot \cdot \cdot , z _ { i } ^ { t - 1 } )
$
This loss encourages the model to generate the correct GID for an item when provided with its content and interacting user information, thereby aligning these diverse signals with the collaborative GID.
4.7. Multi-Task Training
ColaRec is trained with a combined objective that includes the recommendation loss, indexing loss, and two additional losses for ranking and contrastive learning.
4.7.1. Item Ranking
To improve the model's ranking capabilities, a Bayesian Personalized Ranking (BPR) loss [42] is applied. For each positive user-item pair (u, i) in the training set, a negative item (an item user has not interacted with) is randomly sampled. The BPR loss aims to maximize the score of the positive item over the negative item:
$
\mathcal { L } _ { \mathrm { bpr } } = - \ln \sigma ( \mathbf { h } ( X _ { u } ) \cdot ( \mathbf { h } ( X _ { i } ) - \mathbf { h } ( X _ { i _ { - } } ) ) )
$
where denotes the last hidden state of the Encoder when processing its input. Specifically, represents the user's aggregated content representation, represents the content-based representation of item , and represents the content-based representation of the negative item . is the sigmoid function. This loss pushes the representation of the positive item closer to the user's representation and further from the negative item's representation in the latent space.
4.7.2. Contrastive Learning
A contrastive loss is introduced to ensure better alignment between collaborative signals (embedded in GIDs) and content-based semantic representations. The core idea is that items with similar GIDs should also have similar content representations.
For an item , a positive sample is randomly chosen such that it shares an overlapped prefix of tokens in its GID with item . The negative sample from the BPR loss is used here, ensuring that has no overlapped GID tokens with item . The contrastive loss is defined as:
$
\mathcal { L } _ { \mathrm { c } } = - \ln \sigma ( \mathbf { h } ( X _ { i } ) \cdot ( \mathbf { h } ( X _ { i _ { + } } ) - \mathbf { h } ( X _ { i _ { - } } ) ) )
$
Here, is the content-based representation of item , for the positive sample, and for the negative sample. This loss encourages the model to pull the content representation of item closer to that of (which has a similar GID) and push it away from (which has a dissimilar GID). This helps the model learn more discriminative item input representations that reflect both content and collaborative similarity.
4.7.3. Joint Optimization
All the described losses are combined for the final joint optimization of ColaRec:
$
\mathcal { L } = \mathcal { L } _ { \mathrm { rec } } + \mathcal { L } _ { \mathrm { i n d e x } } + \mathcal { L } _ { \mathrm { b p r } } + \alpha \mathcal { L } _ { \mathrm { c } }
$
where is a hyperparameter controlling the weight of the contrastive loss.
4.7.4. Inference
During inference, to prevent the generation of invalid GIDs (sequences that do not correspond to any actual item), constrained beam search [6] is employed. This technique limits the possible tokens that can be generated at each step based on the previously generated prefix tokens, ensuring that only valid GID paths are explored.
5. Experimental Setup
5.1. Datasets
The experiments were conducted on four real-world public datasets:
- Amazon Product Reviews: Three subcategories were used:
BeautySports and OutdoorsCell Phones and Accessories(Phone)- For these Amazon datasets,
item content informationincludes "title", "brand", and "categories" from the item metadata.
- Food.com:
Recipe- For the
Recipedataset,item content informationincludes "name", "description", and "tag".
Preprocessing:
-
Users and items with fewer than five interactions were filtered out from all datasets.
The following are the results from Table 1 of the original paper:
Datasets #Users #Items #Interactions Beauty 22,363 12,101 198,502 Sports 35,598 18,357 296,337 Phone 27,879 10,429 194,439 Recipe 17,813 41,240 555,618
These datasets were chosen because they represent diverse domains (e-commerce, recipes) and are commonly used benchmarks in recommender systems research, making them effective for validating the model's performance and generalization. The inclusion of textual content alongside interaction data aligns with the paper's focus on fusing both information types.
5.2. Evaluation Metrics
Two widely used metrics for evaluating recommendation performance are employed: Recall@n and Normalized Discount Cumulative Gain (NDCG@n). The candidate item set for evaluation is the entire item set, not a small subset. Each experiment was run three times, and the average score is reported.
-
Recall@n
- Conceptual Definition:
Recall@nmeasures the proportion of relevant (ground-truth) items that are successfully retrieved and included in the top recommended items. It indicates the model's ability to find as many relevant items as possible within a given recommendation list length. - Mathematical Formula: $ \mathrm{Recall@n} = \frac{\sum_{u \in U} | {\mathrm{recommended \ items \ for \ } u }{@n} \cap {\mathrm{ground-truth \ items \ for \ } u} |}{\sum{u \in U} |{\mathrm{ground-truth \ items \ for \ } u} |} $
- Symbol Explanation:
- : The set of all users in the test set.
- : The set of top items recommended to user .
- : The set of actual items user interacted with in the test set.
- : Denotes the cardinality (number of elements) of a set.
- Conceptual Definition:
-
Normalized Discount Cumulative Gain (NDCG@n)
- Conceptual Definition:
NDCG@nis a measure of ranking quality that takes into account the position of relevant items in the recommendation list. It assigns higher scores to relevant items that appear earlier in the list. It is "normalized" by comparing the calculatedDCGwith theideal DCG(where all relevant items are perfectly ranked at the top). - Mathematical Formula:
$
\mathrm{NDCG@n} = \frac{\mathrm{DCG@n}}{\mathrm{IDCG@n}}
$
where
DCG@n(Discounted Cumulative Gain at rank ) is calculated as: $ \mathrm{DCG@n} = \sum_{k=1}^{n} \frac{\mathrm{rel}k}{\log_2(k+1)} $ andIDCG@n(Ideal Discounted Cumulative Gain at rank ) is: $ \mathrm{IDCG@n} = \sum{k=1}^{|\mathrm{REL}|} \frac{\mathrm{rel}_k}{\log_2(k+1)} $ - Symbol Explanation:
- : The relevance score of the item at position in the recommendation list. In typical recommendation scenarios (implicit feedback), is 1 if the item at position is a ground-truth item, and 0 otherwise.
- : The number of items in the recommendation list being considered (e.g., 5, 10, 20).
- : The total number of relevant items in the ground truth for the current user, up to rank .
IDCG@nsorts the relevant items by their true relevance (which is 1 for all ground-truth items in implicit feedback) to achieve the maximum possibleDCGscore.
- Conceptual Definition:
5.3. Baselines
ColaRec is compared against several representative baselines, categorized into CF-based methods and Generative models for recommendation:
CF-based Methods:
NeuMF[18]:Neural Matrix Factorization, enhances traditionalMatrix Factorizationwith deep neural networks to learn non-linear patterns from user-item interactions.LightGCN[16]: A simplifiedGraph Neural Networkmodel forCollaborative Filtering, which learns user and item representations through linear neighborhood aggregation on the user-item interaction graph.SimpleX[36]: A straightforwardCFmodel that employs a cosine-basedcontrastive lossand negative sampling to learn user and item embeddings.NCL[34]:Neural Collaborative Learning, an improvement overLightGCNthat integratescontrastive learningto further enhance the quality of graph-basedCFrepresentations.
Generative Models for Recommendation:
-
MultiVAE[31]: Anautoencoder-basedmethod that usesVariational Autoencoders(VAEs) to model user-item interaction signals through a reconstruction objective. -
DiffRec[50]: A recent recommendation model based ondiffusion models, which learns user-item interaction knowledge by a reconstruction and denoising process. -
DSI[47]:Differentiable Search Index, agenerative document retrievalmethod. Two versions are adapted for recommendation:DSI-R: ADSImodel whereGIDsare random strings.DSI-S: ADSImodel whereGIDsare constructed usinghierarchical K-meansbased on item textual content embeddings from a pretrainedBERTmodel.
-
TIGER[40]: Agenerative recommendationmethod that uses a pretrainedSentence-T5 encoderfor item textual content embeddings, quantizes them with anRQ-VAEto buildGIDs, and then uses anencoder-decoder Transformerfor sequential recommendation. Adapted for general recommendation by removing sequential orders. -
LC-Rec[63]: AdaptsLarge Language Modelsby integratingcollaborative semanticsfor recommendation. It follows a similar approach toTIGERforitem index learningand uses variouslanguage generation tasksunder different prompts.P5-based baselines[12, 22] were not included because they require candidate items in the input prompt, limiting their ability to perform ranking across the whole item set in general recommendation tasks.
5.4. Implementation Details
- GID Length (): Set to
3for all datasets. - K-means Cluster Number (): Set to
32for Beauty, Sports, and Phone datasets; set to48for the Recipe dataset. - User Representation: Each user's input is formed by aggregating
randomly sampled interacted item tuples. - Item Representation (Indexing Task): Each item's input for the indexing task includes one
randomly sampled userwho interacted with it. - Negative Sampling: A
uniform distributionis used to sample negative instances for both and to ensure fair comparison and avoid biases from different negative sampling strategies. - Embedding Dimensions:
uad,iad, andcodebookembeddings are set to512to match the word embeddings of the pretrainedT5-smallmodel used as theTransformer. - Contrastive Loss Coefficient (): Values are fine-tuned for each dataset:
0.02(Beauty),0.08(Sports),0.1(Phone), and0.05(Recipe). - Optimizer:
AdamWis used with a learning rate of5e-4. - Batch Size:
128. - Baselines Hyper-parameters: Hyper-parameters for baselines were carefully tuned, with user and item embedding sizes set to
512for fair comparison withColaRec.
6. Results & Analysis
6.1. Core Results Analysis
The experimental results demonstrate ColaRec's superior performance across various metrics and datasets, particularly highlighting its effectiveness for long-tail users.
6.1.1. Comparison on Whole Users (RQ1)
The following are the results from Table 2 of the original paper:
| Datasets | Metric | CF-based Methods | Generative Models for Recommendation | Ours | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NeuMF | LightGCN | SimpleX | NCL | MultiVAE | DiffRec | DSI-R | DSI-S | TIGER | LC-Rec | |||
| Beauty | R@5 | 0.0447 | 0.0649 | 0.0551 | 0.0650 | 0.0530 | 0.0524 | 0.0128 | 0.0451 | 0.0519 | 0.0492 | 0.0667* |
| R@10 | 0.0653 | 0.0952 | 0.0831 | 0.0940 | 0.0776 | 0.0741 | 0.0228 | 0.0705 | 0.0799 | 0.0770 | 0.0993* | |
| R@20 | 0.0889 | 0.1314 | 0.1193 | 0.1320 | 0.1093 | 0.1016 | 0.0360 | 0.1018 | 0.1154 | 0.1104 | 0.1371* | |
| N@5 | 0.0315 | 0.0450 | 0.0377 | 0.0452 | 0.0362 | 0.0378 | 0.0084 | 0.0305 | 0.0350 | 0.0326 | 0.0449 | |
| N@10 | 0.0383 | 0.0549 | 0.0469 | 0.0547 | 0.0443 | 0.0450 | 0.0117 | 0.0385 | 0.0443 | 0.0415 | 0.0556* | |
| N@20 | 0.0445 | 0.0643 | 0.0563 | 0.0646 | 0.0526 | 0.0521 | 0.0151 | 0.0470 | 0.0534 | 0.0499 | 0.0654* | |
| Sports | R@5 | 0.0206 | 0.0418 | 0.0355 | 0.0427 | 0.0314 | 0.0273 | 0.0117 | 0.0320 | 0.0374 | 0.0397 | 0.0442* |
| R@10 | 0.0321 | 0.0623 | 0.0557 | 0.0631 | 0.0476 | 0.0403 | 0.0178 | 0.0497 | 0.0572 | 0.0617 | 0.0660* | |
| R@20 | 0.0471 | 0.0901 | 0.0836 | 0.0908 | 0.0713 | 0.0569 | 0.0284 | 0.0766 | 0.0881 | 0.0931 | 0.0964* | |
| N@5 | 0.0140 | 0.0288 | 0.0240 | 0.0294 | 0.0208 | 0.0193 | 0.0079 | 0.0225 | 0.0249 | 0.0264 | 0.0294 | |
| N@10 | 0.0177 | 0.0355 | 0.0306 | 0.0359 | 0.0261 | 0.0235 | 0.0099 | 0.0284 | 0.0313 | 0.0335 | 0.0364* | |
| N@20 | 0.0215 | 0.0426 | 0.0377 | 0.0431 | 0.0321 | 0.0278 | 0.0126 | 0.0350 | 0.0392 | 0.0413 | 0.0442* | |
| Phone | R@5 | 0.0410 | 0.0713 | 0.0643 | 0.0717 | 0.0569 | 0.0470 | 0.0187 | 0.0412 | 0.0601 | 0.0615 | 0.0745* |
| R@10 | 0.0603 | 0.1052 | 0.0976 | 0.1043 | 0.0855 | 0.0668 | 0.0341 | 0.0625 | 0.0895 | 0.0919 | 0.1121* | |
| R@20 | 0.0871 | 0.1487 | 0.1420 | 0.1481 | 0.1233 | 0.0928 | 0.0564 | 0.0966 | 0.1299 | 0.1354 | 0.1587* | |
| N@5 | 0.0282 | 0.0481 | 0.0423 | 0.0486 | 0.0378 | 0.0315 | 0.0121 | 0.0282 | 0.0403 | 0.0408 | 0.0490* | |
| N@10 | 0.0344 | 0.0590 | 0.0530 | 0.0593 | 0.0470 | 0.0379 | 0.0170 | 0.0347 | 0.0498 | 0.0506 | 0.0611* | |
| N@20 | 0.0412 | 0.0700 | 0.0643 | 0.0704 | 0.0566 | 0.0445 | 0.0225 | 0.0431 | 0.0600 | 0.0615 | 0.0729* | |
| Recipe | R@5 | 0.0118 | 0.0188 | 0.0114 | 0.0192 | 0.0167 | 0.0142 | 0.0142 | 0.0157 | 0.0168 | 0.0174 | 0.0198* |
| R@10 | 0.0210 | 0.0296 | 0.0202 | 0.0298 | 0.0285 | 0.0235 | 0.0248 | 0.0270 | 0.0292 | 0.0289 | 0.0306* | |
| R@20 | 0.0339 | 0.0454 | 0.0328 | 0.0459 | 0.0462 | 0.0343 | 0.0403 | 0.0436 | 0.0464 | 0.0454 | 0.0482* | |
| N@5 | 0.0088 | 0.0149 | 0.0093 | 0.0149 | 0.0128 | 0.0105 | 0.0107 | 0.0122 | 0.0137 | 0.0138 | 0.0151* | |
| N@10 | 0.0119 | 0.0182 | 0.0122 | 0.0182 | 0.0167 | 0.0135 | 0.0141 | 0.0158 | 0.0176 | 0.0175 | 0.0185* | |
| N@20 | 0.0154 | 0.0223 | 0.0156 | 0.0224 | 0.0214 | 0.0165 | 0.0182 | 0.0202 | 0.0221 | 0.0218 | 0.0232* | |
(Note: * denotes a paired t-test with significance p-value < 0.1)
Observations from Table 2:
- Overall Superiority:
ColaRecachieves the best recommendation performance across almost all metrics (Recall@nandNDCG@n) on all four datasets. The only exception isNDCG@5inBeautyandSports, where it achieves comparable scores with theNCLbaseline (0.0449 vs 0.0452 in Beauty, 0.0294 vs 0.0294 in Sports). - Significant Improvement in Recall@20:
ColaRecconsistently outperforms previousCF-basedandgenerative modelsinRecall@20, showing relative improvements of (Beauty), (Sports), (Phone), and (Recipe). This indicates its strong ability to retrieve relevant items within a longer recommendation list. - Outperforming Generative Retrieval Adaptations:
ColaRecsignificantly outperformsDSI-R(random GIDs) andDSI-S(content-based GIDs). For example, it shows notable relative improvements inRecall@5of , , , and on the four datasets respectively compared toDSI-S. This suggests that simply adaptinggenerative retrievalmethods for recommendation is insufficient, andColaRec's tailored approach forGIDconstruction and multi-task learning is effective. - Superiority over Other Generative Recommenders:
ColaRecconsistently outperformsTIGERandLC-Rec. This is attributed toTIGERoverlookingcollaborative signalsandLC-Rec's reliance onlanguage generationfor recommendation, whichColaRecargues is not fully aligned with the task. - Competitiveness with Strong CF Baselines: While existing generative methods like
DiffRecandTIGERoften underperform strongCF-basedmethods (NCL),ColaRecachieves competitive or superior results againstNCLand otherCFmodels on all datasets. This indicates thatColaRecsuccessfully infusescontent informationintocollaborative generationwithout sacrificing the core strengths ofCF.
6.1.2. Comparison on Long-Tail Users (RQ1)
The paper also evaluated ColaRec's performance on long-tail users (users with sparse interactions), where the ratio between head users and long-tail users was set to .
The following are the results from Table 3 of the original paper:
| Datasets | Metric | CF-based Methods | Generative Models for Recommendation | Ours | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NeuMF | LightGCN | SimpleX | NCL | MultiVAE | DiffRec | DSI-R | DSI-S | TIGER | LC-Rec | |||
| Beauty | R@5 | 0.0416 | 0.0636 | 0.0555 | 0.0639 | 0.0510 | 0.0464 | 0.0131 | 0.0415 | 0.0487 | 0.0492 | 0.0660** |
| R@10 | 0.0604 | 0.0922 | 0.0825 | 0.0907 | 0.0742 | 0.0662 | 0.0228 | 0.0653 | 0.0745 | 0.0772 | 0.0975** | |
| R@20 | 0.0817 | 0.1253 | 0.1160 | 0.1264 | 0.1039 | 0.0917 | 0.0354 | 0.0940 | 0.1084 | 0.1107 | 0.1327** | |
| Sports | R@5 | 0.0209 | 0.0433 | 0.0355 | 0.0440 | 0.0329 | 0.0267 | 0.0116 | 0.0307 | 0.0380 | 0.0397 | 0.0456** |
| R@10 | 0.0317 | 0.0639 | 0.0562 | 0.0645 | 0.0495 | 0.0394 | 0.0170 | 0.0472 | 0.0581 | 0.0617 | 0.0674** | |
| R@20 | 0.0468 | 0.0904 | 0.0836 | 0.0908 | 0.0725 | 0.0553 | 0.0273 | 0.0728 | 0.0882 | 0.0929 | 0.0976** | |
| Phone | R@5 | 0.0405 | 0.0723 | 0.0660 | 0.0727 | 0.0571 | 0.0451 | 0.0206 | 0.0404 | 0.0602 | 0.0612 | 0.0756** |
| R@10 | 0.0590 | 0.1054 | 0.0986 | 0.1043 | 0.0861 | 0.0641 | 0.0371 | 0.0623 | 0.0898 | 0.0918 | 0.1126** | |
| R@20 | 0.0855 | 0.1482 | 0.1418 | 0.1473 | 0.1228 | 0.0899 | 0.0600 | 0.0939 | 0.1293 | 0.1353 | 0.1590** | |
| Recipe | R@5 | 0.0128 | 0.0204 | 0.0121 | 0.0210 | 0.0182 | 0.0172 | 0.0157 | 0.0171 | 0.0181 | 0.0189 | 0.0219** |
| R@10 | 0.0229 | 0.0320 | 0.0212 | 0.0322 | 0.0309 | 0.0269 | 0.0274 | 0.0295 | 0.0316 | 0.0313 | 0.0334** | |
| R@20 | 0.0371 | 0.0487 | 0.0343 | 0.0490 | 0.0499 | 0.0412 | 0.0443 | 0.0475 | 0.0504 | 0.0493 | 0.0528** | |
(Note: ** denotes improvements are significant with p-value < 0.05. Only Recall metrics are shown, as NDCG shows similar trends.)
Observations from Table 3:
-
Superiority for Long-Tail Users:
ColaRecsignificantly outperforms all baselines forlong-tail usersacross allRecall@nmetrics on all datasets. This is a crucial advantage, aslong-tail usersoften suffer from poor recommendations due to sparse interaction data, a common problem in real-world systems. -
Reason for Improvement: The authors attribute this improvement to
ColaRec's ability to model bothuser-item interactions(collaborative signals) anditem content information. Forlong-tail userswith limited interaction history, the richcontent informationprovides crucial supplementary knowledge thatCF-onlymethods might lack, enabling better recommendations.In summary,
ColaRecis effective in yielding superior recommendation performance across a wide range of users and is particularly beneficial forlong-tail users, demonstrating the value of its unified modeling and alignment strategy.
6.2. Ablation Study (RQ2)
To understand the contribution of each component of ColaRec, an ablation study was conducted. Four variants were created by removing specific components:
-
(1) w/o textual content: Removes all textual content information from the model input. -
(2) w/o indexing: Removes theitem-item indexing task. -
(3) w/o Lbpr: Removes theBayesian Personalized Ranking (BPR) loss(). -
(4) w/o Lc: Removes thecontrastive loss().The following are the results from the upper part of Table 4 of the original paper:
Beauty Sports Phone Recipe Recall@5 NDCG@5 Recall@5 NDCG@5 Recall@5 NDCG@5 Recall@5 NDCG@5 ColaRec 0.0667 0.0449 0.0442 0.0294 0.0745 0.0490 0.0198 0.0151 (1) w/o textual content 0.0527 0.0346 0.0364 0.0239 0.0636 0.0426 0.0181 0.0141 (2) w/o indexing 0.0637 0.0428 0.0422 0.0278 0.0728 0.0487 0.0179 0.0142 (3) w/o Lbpr 0.0612 0.0412 0.0424 0.0282 0.0719 0.0486 0.0184 0.0140 (4) w/o Lc 0.0657 0.0434 0.0422 0.0279 0.0731 0.0485 0.0188 0.0145
(Note: The best scores are marked in bold.)
Observations from Table 4 (upper part):
-
Importance of Textual Content: Removing textual content (
(1) w/o textual content) leads to a significant performance drop across all datasets (e.g., in Beauty, in Sports, in Phone, and in Recipe forRecall@5). This confirms thatitem content informationis crucial for the model's understanding of items and for improving recommendation quality. -
Effectiveness of BPR Loss: The absence of the
BPR loss((3) w/o Lbpr) also results in a notable performance decrease. This highlights the importance of thepairwise ranking objectivein teaching the model to prioritize relevant items within thegenerative recommendationframework. -
Value of Alignment Components: Both the
item-item indexing task((2) w/o indexing) and thecontrastive loss((4) w/o Lc) contribute positively to the overall performance. Removing either results in a consistent reduction in performance across all datasets. This validates the effectiveness of these explicit alignment mechanisms in facilitating mutual reinforcement betweenitem content informationanduser-item collaborative signals, leading to more comprehensive and effective item representations.In conclusion, each component of
ColaRec—textual content,item indexing task,BPR loss, andcontrastive loss—is essential and contributes to the model's superior recommendation performance.
6.3. GID Investigation (RQ3)
This section investigates the impact of GID design choices on recommendation performance.
6.3.1. Effect of Different GID Types
To evaluate the proposed GID construction strategy (based on collaborative signals), it was compared against three alternative GID types:
-
iad-based GID: Each item is represented by a single, uniqueatomic identifier(). -
Random GID: Each item is assigned a random string as itsGID, without any underlying knowledge. -
Content GID:GIDsare constructed usinghierarchical K-means clusteringbased on item textual content embeddings derived from a pretrainedBERTmodel.For a fair comparison, the length and
codebook sizeforRandom GIDsandContent GIDswere kept identical to those inColaRec.
The following are the results from the bottom part of Table 4 of the original paper:
| Beauty | Sports | Phone | Recipe | |||||
|---|---|---|---|---|---|---|---|---|
| Recall@5 | NDCG@5 | Recall@5 | NDCG@5 | Recall@5 | NDCG@5 | Recall@5 | NDCG@5 | |
| ColaRec | 0.0667 | 0.0449 | 0.0442 | 0.0294 | 0.0745 | 0.0490 | 0.0198 | 0.0151 |
| iad | 0.0658 | 0.0437 | 0.0428 | 0.0285 | 0.0719 | 0.0474 | 0.0189 | 0.0145 |
| Random | 0.0600 | 0.0401 | 0.0411 | 0.0272 | 0.0667 | 0.0443 | 0.0190 | 0.0149 |
| Content | 0.0662 | 0.0440 | 0.0423 | 0.0278 | 0.0716 | 0.0477 | 0.0183 | 0.0141 |
(Note: The best scores are marked in bold.)
Observations from Table 4 (bottom part):
-
Superiority of Collaborative GIDs:
ColaRec'sGIDconstruction method, which is based oncollaborative signalsfromLightGCNembeddings, achieves the best performance across all metrics and datasets. This underscores the effectiveness and importance of encodingcollaborative signalsdirectly into theGIDs. -
Collaborative vs. Content GIDs:
ColaRecoutperformsContent GID, indicating that while content is important,collaborative signalsare more effective for the fundamental structure ofGIDsingenerative recommendation. -
GIDs vs. Single ItemIDs:
ColaRecalso outperforms theiad-based GID(single token). This demonstrates the benefit of usingsequential GIDsthat explicitly introduce item correlations and structured information, compared to arbitrary singleitemIDs. -
Random GIDs: The
Random GIDmethod yields the lowest performance, as expected. Random strings introduce noise and lack meaningful correlations, making the learning process more difficult and less effective.These results strongly emphasize the importance of constructing effective
GIDsthat embed meaningful information, particularlycollaborative signals, for robustgenerative recommender systems.
6.3.2. Impact of GID Hyper-parameters
The paper investigated the impact of two key GID hyper-parameters: GID length (l) and the number of clusters (K) in the hierarchical clustering.
Impact of GID Length ()
The following figure (Figure 4 from the original paper) shows the impact of GID length on performance:

该图像是图表,展示了在 GID 长度变化下的召回率(Recall@10)和标准化折扣累积增益(NDCG@10)。左侧为 "Beauty" 类别,右侧为 "Sports" 类别,较长的 GID 在 "Beauty" 类别下表现出更好的效果。数据呈现了不同 GID 长度对于推荐效果的影响。
Figure 4: Impact of the length of GIDs.
Observations from Figure 4:
- The
Recall@10andNDCG@10performance ofColaRecfluctuates as theGIDlength varies from 1 to 4. - From to : A performance drop is observed in
BeautyandSports. This is attributed toGIDswith increasing the decoding steps while the search space for each step remains large, making the generation more difficult than a single tokenGID(). - Optimal Length ():
ColaRecachieves the best performance in most cases when . This suggests that strikes a good balance between the number of decoding steps and the size of the search space for eachGIDposition. - Longer GIDs (): When , performance generally decreases. A longer
GIDimplies moreauto-regressive decoding steps, which increasesgeneration difficultyandinference latency. - Conclusion: was chosen as the default setting, indicating a practical trade-off for effective and efficient
GIDgeneration.
Impact of Number of Clusters ()
The following figure (Figure 5 from the original paper) shows the impact of the number of clusters on performance:

该图像是图表,展示了不同聚类数量对推荐系统性能的影响。左侧为“美容”类别,右侧为“运动”类别。图表中分别用蓝色和橙色曲线表示Recall@10和NDCG@10随聚类数量变化的趋势。
Figure 5: Impact of the number of clusters.
Observations from Figure 5:
-
With the
GIDlength fixed at , the number of clusters was varied (32, 64, 96, 128). -
Higher and Performance: Generally, a higher value of tends to result in a slight decrease in overall performance, with the drop being more noticeable in
Beauty. -
Reason for Decrease: A higher means a larger
search spacefor the decoder at eachGIDposition, which increases thegeneration difficultyand computational complexity. -
Principle for Choosing : It is important to select a suitable based on the total number of items. needs to be large enough to uniquely encode the entire item set (given the
GIDlength ), but beyond that, it should be controlled to limit thesearch spaceand maintain generation efficiency.These investigations highlight that careful design and tuning of
GIDconstruction parameters are essential for maximizing the performance ofgenerative recommendationmodels.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper introduced ColaRec, a novel content-based collaborative generation framework for recommender systems. ColaRec stands out by effectively integrating both item content information and user-item collaborative signals within a unified end-to-end sequence-to-sequence generative model tailored for direct item identifier generation. The core innovation lies in its GID construction (derived from a pretrained LightGCN model to embed collaborative signals) and the explicit alignment mechanisms: an auxiliary item indexing task and a contrastive loss. Extensive experiments on four real-world datasets demonstrated that ColaRec significantly outperforms state-of-the-art baselines, especially for long-tail users who typically suffer from data sparsity. This work validates the potential of a generative paradigm that carefully fuses multi-faceted information for robust and accurate recommendations.
7.2. Limitations & Future Work
The authors acknowledge several areas for future research:
- GID Construction: Investigating more advanced methods for constructing
Generative Identifiers (GIDs)that might better encode item properties and relationships. - Alignment Approaches: Adopting more effective strategies for aligning
content informationandcollaborative signalsbeyond the current indexing task and contrastive loss. This could involve more sophisticated cross-modal learning techniques. - Negative Sampling: Exploring improved
negative samplingtechniques forgenerative recommendation. This could involve sampling moreinformative negative samples(e.g., hard negatives) usingGIDsor even generatingsynthetic negative instanceswith generative models. - Model Efficiency: Addressing how to improve the
model efficiencyofgenerative recommendationduring both training and inference, asauto-regressive generationcan be computationally intensive, especially for longGIDsor large item sets.
7.3. Personal Insights & Critique
Personal Insights
This paper presents a strong step forward in generative recommendation. The most significant insight is the explicit focus on alignment between content and collaborative signals, not just their co-existence. The use of a multi-task learning approach with an item indexing task and a contrastive loss to bridge the semantic gaps between different information modalities is a well-reasoned and effective design. This approach of generating GIDs directly, rather than relying on LLM-based text generation and subsequent grounding, seems more aligned with the core task of item recommendation and avoids many of the complexities and inefficiencies associated with LLM integration. The superior performance on long-tail users is particularly noteworthy, highlighting the practical value of integrating content information to alleviate cold-start or sparsity issues.
The method of constructing GIDs from LightGCN embeddings is clever, embedding a strong collaborative signal directly into the identifier structure. This provides a solid foundation upon which the content-based aspects can be built and aligned. The idea of generative retrieval is clearly influential here, and ColaRec successfully adapts it to the unique challenges of recommendation by incorporating richer user-item interactions and content.
Potential Issues, Unverified Assumptions, or Areas for Improvement
-
Interpretability of GIDs: While
GIDsare structured sequences, their direct interpretability for humans isn't explicitly discussed. If theGIDtokens could be designed to correspond to human-understandable attributes or clusters, it could further enhance the model's explainability, which is a growing demand inrecommender systems. -
Scalability of GID Construction: The
hierarchical K-meansapproach forGIDconstruction, especially for very large item sets, might become computationally expensive. WhileLightGCNis efficient, the clustering itself could be a bottleneck. ExploringGIDconstruction methods that are inherently more scalable or dynamic could be beneficial. -
Dependency on Pretrained CF Model: The quality of
ColaRec'sGIDsheavily relies on the performance of thepretrained LightGCN model. If theLightGCNmodel is suboptimal or biased, it could impact the entireColaRecsystem. An end-to-end learning approach forGIDconstruction that doesn't rely on a separate pretrained model could be an interesting future direction, though potentially more complex. -
Implicit Feedback Only: The paper focuses on
implicit feedback. ExtendingColaRecto handleexplicit feedback(e.g., star ratings) ormultimodal content(e.g., images, videos) could further broaden its applicability and performance. -
Efficiency and Latency: As acknowledged in future work,
auto-regressive generationcan be slower than direct embedding similarity search. Whileconstrained beam searchhelps, further optimizations for inference latency, especially in real-time recommendation scenarios, would be critical for practical deployment. -
Negative Sampling Strategies: The use of
uniform negative samplingfor and is a simplification. Investigating more sophisticatedhard negative miningstrategies orin-batch negative samplingcould potentially yield further performance gains. -
Task Token Generalization: The use of special
task tokens(, ) is a commonTransformerpractice. However, the robustness of this mechanism and its potential impact on performance if more tasks are introduced or if the task distribution is imbalanced could be an area for deeper analysis.Overall,
ColaRecprovides a robust and well-validated framework forgenerative recommendation, effectively addressing key challenges in integratingcontentandcollaborative signals. Its approach is likely transferable to other domains where structured identifiers and rich content information are available, serving as a strong foundation for future research ingenerative recommender systems.
Similar papers
Recommended via semantic vector search.