FICLRec: Frequency enhanced intent contrastive learning for sequential recommendation
TL;DR Summary
FICLRec, a proposed model, uses frequency-enhanced intent contrastive learning to address the limitations of capturing high-frequency intents in sequential recommendation. It significantly improves performance across five real-world datasets.
Abstract
User purchasing behavior is mainly driven by their intentions. However, existing methods typically favor low-frequency intents, leading to insufficient capability in capturing more expressive high-frequency intents. Moreover, like typical sequence recommendations, data sparsity remains a primary factor influencing recommendation performance. To address this issue, we propose a Frequency Enhanced Intent Contrastive Learning Recommendation model (FICLRec), which innovatively utilizes frequency information from users’ latent intentions to improve the recognition of high-frequency intents. Additionally, we introduce frequency contrastive learning to reduce the negative impact of data sparsity on model performance. To validate the effectiveness of the proposed method, extensive experiments were conducted on five real-world datasets: Beauty (0.19M interactions), Sports (0.29M interactions), Toys (0.16M interactions), Yelp (0.31M interactions), and LastFM (0.05M interactions). The experimental results indicate that, in comparison with baseline models, our method improves by 2.03%, 4.87%, 2.50%, 13.85%, and 16.93% on five datasets, proving the effectiveness of our method. Our implemented model is available via https://github.com/syf1844803351/FICLRec .
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
FICLRec: Frequency enhanced intent contrastive learning for sequential recommendation
1.2. Authors
- Yifeng Su
- Xiaodong Cai
- Ting Li
1.3. Journal/Conference
The paper does not explicitly state the journal or conference name in the provided text. The abstract mentions "Published at (UTC): 2025-06-11T00:00:00.000Z", suggesting it is a forthcoming publication. Given the academic rigor and content, it is likely intended for a reputable conference or journal in artificial intelligence, machine learning, or recommender systems.
1.4. Publication Year
2025 (as indicated by the publication UTC timestamp 2025-06-11T00:00:00.000Z)
1.5. Abstract
User purchasing behavior is primarily driven by their intentions. Existing sequential recommendation methods often struggle to effectively capture high-frequency intents (e.g., immediate, short-term interests) and tend to favor low-frequency intents (e.g., stable, long-term preferences). Additionally, data sparsity, a common challenge in sequential recommendation, significantly impacts performance. To address these issues, the authors propose the Frequency Enhanced Intent Contrastive Learning Recommendation model (FICLRec). This model innovatively utilizes frequency information extracted from users' latent intentions to improve the recognition of high-frequency intents. Furthermore, it introduces frequency contrastive learning to mitigate the negative effects of data sparsity. Extensive experiments were conducted on five real-world datasets (Beauty, Sports, Toys, Yelp, and LastFM), demonstrating that FICLRec consistently outperforms baseline models, achieving improvements of 2.03%, 4.87%, 2.50%, 13.85%, and 16.93% on these datasets. The implemented model is publicly available.
1.6. Original Source Link
/files/papers/692bf9cf4114e99a4cde8763/paper.pdf This is a local file link, suggesting the paper is either a preprint or an internal document. Its publication status is likely pending or a preprint as of the provided date.
2. Executive Summary
2.1. Background & Motivation
The core problem addressed by this paper lies in the limitations of existing sequential recommendation (SR) methods, which aim to predict the next item a user will interact with based on their historical sequence of interactions. User purchasing behavior is fundamentally driven by their intentions, which can be short-term (high-frequency, rapidly changing) or long-term (low-frequency, stable). The paper identifies two critical challenges:
-
Bias towards low-frequency intents: Current
SRmodels often struggle to effectively capture and representhigh-frequency intentsfrom user interaction sequences. This leads to insufficient capability in recognizing dynamic, momentary user interests, which are crucial for timely and relevant recommendations. -
Data sparsity: Like many
recommendation systems,sequential recommendationsuffers fromdata sparsity, meaning that most users interact with only a small fraction of available items. This scarcity of interaction data makes it difficult for models to learn robust and generalizable user preferences and item representations, thereby hindering recommendation performance.The importance of solving these problems stems from the need for more accurate and personalized recommendations. If a system can better understand both the fleeting and enduring intentions of users, it can provide more relevant suggestions, improving user satisfaction and engagement. The paper's innovative idea is to leverage
frequency informationto disentangle and enhance the learning of different types of user intentions and to applycontrastive learningin the frequency domain to combatdata sparsity.
2.2. Main Contributions / Findings
The primary contributions and key findings of the FICLRec paper are:
- Novel Model for Frequency-Enhanced Intent Learning: The paper proposes
FICLRec, a novel model that innovatively usesfrequency informationderived from users' latent intentions. This approach specifically aims to improve the recognition ofhigh-frequency intents, addressing the bias towardslow-frequency intentsin existing models. - Frequency Contrastive Learning for Data Sparsity:
FICLRecintroducesfrequency contrastive learning, a mechanism designed to reduce the negative impact ofdata sparsity. By contrasting representations in the frequency domain, the model can learn more robust and informative user and item embeddings, even with limited interaction data. - Comprehensive Intent Modeling: The model incorporates a
Frequency Redistribution Encoder (FRE)to decompose user behaviors intolow-frequencyandhigh-frequencycomponents, capturing both stable long-term preferences and dynamic short-term interests. It then utilizes distinctcontrastive learningobjectives for each frequency domain:high-frequency intent contrastive learningandlow-frequency intent contrastive learning. - State-of-the-Art Performance: Extensive experiments on five real-world datasets (Beauty, Sports, Toys, Yelp, LastFM) demonstrate that
FICLRecsignificantly outperforms variousstate-of-the-art (SOTA)sequential recommendationmodels, including traditional methods, frequency-domain methods, self-supervised methods, and intent learning methods. The model achieves average improvements ranging from 2.03% to 16.93% across datasets. - Robustness and Efficiency: The ablation studies confirm the effectiveness of each proposed component.
FICLRecalso exhibits robustness tonoisy dataandsparse data, and its training efficiency is comparable to otherself-attentionbased models, making the performance gains worthwhile despite a slight increase in computational complexity.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand FICLRec, a reader should be familiar with several fundamental concepts in recommender systems and deep learning.
-
Sequential Recommendation (SR):
- Conceptual Definition:
Sequential recommendationis a subfield ofrecommender systemsthat focuses on predicting the next item a user will interact with, given their historical sequence of interactions. Unlike traditionalcollaborative filteringthat might recommend items based on overall preferences,SRemphasizes the temporal order and dependencies between items in a user's behavior history. For example, if a user watched a specific movie,SRmight recommend the next logical movie in a series or a related genre, considering the immediate context. - Importance:
SRis crucial in dynamic environments like e-commerce, content streaming, or news feeds, where user interests evolve rapidly, and the order of interactions carries significant information about their current intent.
- Conceptual Definition:
-
Intent Learning:
- Conceptual Definition:
Intent learninginrecommender systemsaims to identify and model the underlying purposes or goals that drive a user's interactions. These intentions can be explicit (e.g., a user searching for "running shoes") or implicit (e.g., a sequence of interactions that suggests a user is preparing for a trip). - Role in
FICLRec:FICLRecspecifically distinguishes betweenhigh-frequency intents(short-term, dynamic, like buying a specific accessory after a main purchase) andlow-frequency intents(long-term, stable, like a consistent preference for a certain brand or genre). The paper argues that understanding both types of intents leads to better recommendations.
- Conceptual Definition:
-
Contrastive Learning (CL):
- Conceptual Definition:
Contrastive learningis a self-supervised learning paradigm where a model learns representations by pushing "similar" (positive) samples closer together in an embedding space while pushing "dissimilar" (negative) samples farther apart. It doesn't require explicit human labels; instead, it generates pseudo-labels from the data itself. - How it Works: Typically, given an
anchordata point, apositive sampleis created (e.g., through data augmentation or by identifying semantically related data points), and multiplenegative samplesare chosen (e.g., random samples from the batch). Acontrastive lossfunction (likeInfoNCEloss) then aims to minimize the distance between the anchor and positive sample and maximize the distance between the anchor and negative samples. - Role in
FICLRec:FICLRecusescontrastive learningto strengthen the learned intent representations and mitigatedata sparsity. It applies distinctcontrastive learningobjectives forhigh-frequencyandlow-frequencyintents.
- Conceptual Definition:
-
Frequency Domain Analysis (Fourier Transform):
- Conceptual Definition: The
Fourier Transform (FT)is a mathematical operation that decomposes a function or signal into its constituent frequencies. It transforms a signal from its original domain (oftentime domainorspatial domain) into thefrequency domain. In simple terms, it tells us which frequencies are present in the signal and their magnitudes. - Discrete Fourier Transform (DFT) and Inverse DFT (IDFT): For discrete sequences (like user interaction sequences), the
Discrete Fourier Transform (DFT)is used, often implemented efficiently as theFast Fourier Transform (FFT). TheInverse Discrete Fourier Transform (IDFT)orInverse Fast Fourier Transform (IFFT)converts the signal back from thefrequency domainto thetime domain.- DFT Formula: Given a sequence , its
DFTis defined as: whereX _ { k }are the frequency components,x _ { n }are the time-domain samples, is the total number of samples, is the frequency index, and is the imaginary unit. - IDFT Formula: The
IDFTis given by: wherex _ { n }are the reconstructed time-domain samples.
- DFT Formula: Given a sequence , its
- Role in
FICLRec:FICLRecappliesFourier Transformto user interaction sequences to analyze theirfrequency components.Low frequenciestypically capture global, stable, or long-term patterns (e.g., general preference for a genre), whilehigh frequenciescapture local, rapid, or short-term changes (e.g., a sudden interest in specific items). By separating and analyzing these components, the model can better capture different types of user intentions.
- Conceptual Definition: The
-
Self-Attention Mechanism (from Transformers):
- Conceptual Definition:
Self-attentionis a mechanism that allows a model to weigh the importance of different parts of an input sequence when processing a specific element. It helps capture dependencies between items in a sequence regardless of their distance. - How it Works (Simplified): For each item in a sequence,
self-attentioncomputes three vectors: aQuery (Q), aKey (K), and aValue (V). TheQueryof an item is compared against theKeysof all other items (including itself) to computeattention scores. These scores are then scaled, passed through asoftmaxfunction to getattention weights, and finally used to weigh theValuesof all items. The sum of these weightedValuesforms the output representation for the current item, reflecting its contextualized meaning within the sequence. - Scaled Dot-Product Attention Formula: $ \mathrm{Attention}(Q, K, V) = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $ where is the matrix of queries, is the matrix of keys, is the matrix of values, and is the dimension of the keys (used for scaling to prevent vanishing gradients).
- Multi-Head Self-Attention: This extends
self-attentionby running the attention mechanism multiple times in parallel (multiple heads), allowing the model to focus on different aspects of the input sequence simultaneously. The outputs from these heads are then concatenated and linearly transformed. - Role in
FICLRec:FICLRecusesmulti-head self-attentionwithin itsFrequency Redistribution Encoderto capture relationships between items in the sequence, similar to its use inSASRecand otherTransformer-based models.
- Conceptual Definition:
3.2. Previous Works
The paper compares FICLRec against a range of existing sequential recommendation models, which can be broadly categorized.
-
Traditional
Sequential RecommendationMethods:GRU4Rec (Hidasi et al., 2015): One of the pioneeringdeep learningmethods forsequential recommendation, usingGated Recurrent Units (GRUs)to model user sequences.GRUsare a type ofrecurrent neural network (RNN)that can capture temporal dependencies.Caser (Tang & Wang, 2018): Employsconvolutional neural networks (CNNs)with horizontal and vertical convolution techniques to capture patterns in sequential data.SASRec (Kang & McAuley, 2018): A highly influentialTransformer-based model that usesself-attentionto capture long-range dependencies in user sequences, significantly improvingSRperformance compared toRNN-based methods. It serves as a strong backbone for many subsequentSRmodels.
-
Frequency Domain Methods:
FMLPRec (Zhou et al., 2022): Asequential recommendationmodel that processes sequences in thefrequency domainusingFast Fourier Transform (FFT)andInverse FFT (IFFT)to capture global and local patterns.BSARec (Li et al., 2024b): Utilizes bothtime-domainandfrequency-domainanalysis. It integratesFourier transformsto identify and manage periodic patterns andadaptive filteringto reduce noise.FEARec (Ni et al., 2023): Combinesfrequency domainanalysis withattention mechanismsto enhancesequential recommendation, focusing on capturing both global and local patterns effectively.
-
Self-supervised Learning Methods:
DuoRec (Chen et al., 2022): Aself-supervised learningmodel that uses dualcontrastive learningtasks to learn robust representations, addressingdata sparsityanditem embedding degradation.ICLRec (Li et al., 2024d): Integratesself-supervised learningby introducingcontrastive learningobjectives to enhanceintent learninginsequential recommendation.ELCRec (Li et al., 2024c): Focuses on enhancinglatent sequence representationthroughself-supervised contrastive learning, particularly in sparse data contexts.
-
Intent Learning Methods:
ICSRec (Li et al., 2024e): Anintent-contrastive sequential recommendationmodel that employscontrastive self-supervised learningobjectives to learn user intentions more effectively. It aims to make item embeddings more discriminative by contrasting different intention aspects.IDCLRec (Chen et al., 2025): Anotherintent-driven contrastive learningmodel forsequential recommendation.
3.3. Technological Evolution
The field of sequential recommendation has evolved significantly:
-
Early models (e.g.,
Markov Chains,matrix factorization): Focused on simple sequence transitions or static preferences. -
Recurrent Neural Networks (RNNs)(e.g.,GRU4Rec): Introduced the ability to model longer-term dependencies in sequences, but suffered from issues like vanishing gradients and difficulty capturing very long-range connections. -
Convolutional Neural Networks (CNNs)(e.g.,Caser): AppliedCNNsto sequences, offering efficiency and local pattern capture. -
Attention MechanismsandTransformers(e.g.,SASRec): RevolutionizedSRby allowing models to weigh the importance of all previous items for predicting the next one, effectively capturing long-range dependencies and making models highly parallelizable. -
Self-supervised Learning (SSL)(e.g.,DuoRec,ICLRec): Addresseddata sparsityby generating supervisory signals from the data itself, often usingcontrastive learningto learn more robust item and user representations. -
Intent-guided Models(e.g.,ICSRec,ELCRec): Began to explicitly model user intentions, recognizing that different interactions might stem from different underlying goals, leading to more nuanced recommendations. -
Frequency Domain Models(e.g.,FMLPRec,BSARec,FEARec): IntroducedFourier Transformto analyze sequences in thefrequency domain, aiming to disentangle different types of patterns (e.g., global trends vs. local fluctuations) that might correspond to long-term vs. short-term interests.FICLRecfits into this timeline by combining the advancements inintent-guided models,self-supervised learning (specifically contrastive learning), andfrequency domain analysis. It attempts to bridge the gap by usingfrequency domaintechniques to enhance the learning of diverse user intentions within aself-supervised contrastive learningframework.
3.4. Differentiation Analysis
Compared to the main methods in related work, FICLRec offers several core differences and innovations:
-
Novel Combination of Frequency Domain and Intent Contrastive Learning: While models like
FMLPRec,BSARec, andFEARecutilize thefrequency domainforsequential recommendation, and models likeICLRec,ELCRec,ICSRec, andIDCLRecemployintent learningwithcontrastive learning,FICLRecinnovatively integrates both. It specifically usesfrequency informationto enhance the recognition of high-frequency intents and then appliesfrequency contrastive learningto mitigatedata sparsity. -
Explicit Disentanglement of High- and Low-Frequency Intents:
FICLRec'sFrequency Redistribution Encoder (FRE)explicitly decomposes user interaction sequences intolow-frequencyandhigh-frequencycomponents. This allows for dedicated processing andcontrastive learningobjectives for each type of intent, whereas many priorintent-guided modelsmight treat intents more holistically without such a distinct frequency-based separation. -
Targeted
Contrastive LearningObjectives: The model introduces two specificcontrastive learningobjectives:High-frequency intent contrastive learning: This specifically aims to align thehigh-frequencyfeatures of positive pairs while pushing away negative ones, coupled with ahigh-frequency alignment loss() to explicitly bring high-frequency components closer. This is crucial for capturing dynamic, short-term interests.Low-frequency intent contrastive learning: This leveragesintent prototypes(cluster centers) aslow-frequency representationsand includes acluster-level center alignment loss() to ensure that general intent representations are close to their corresponding prototypes. This helps in capturing stable, long-term preferences.
-
Addressing Data Sparsity through Frequency-Aware CL: By introducing
frequency contrastive learning,FICLRecattempts to learn more robust embeddings even with limited data. The frequency domain can sometimes highlight underlying patterns that are less apparent in raw, sparse time-domain data, makingcontrastive learningmore effective. -
Improved High-Frequency Intent Capture: The paper claims that existing methods typically favor
low-frequency intents.FICLRecdirectly confronts this by enhancing the processing ofhigh-frequency components, leading to a better balance in capturing diverse user intentions.In essence,
FICLRecstands out by providing a structured way to separate and learn short-term and long-term user intentions usingfrequency domaintechniques, and then solidifying these learnings through tailoredcontrastive learningobjectives, particularly addressing the persistent problem ofdata sparsity.
4. Methodology
4.1. Principles
The core idea behind FICLRec is to enhance sequential recommendation by explicitly modeling both high-frequency (short-term, dynamic) and low-frequency (long-term, stable) user intentions. The theoretical basis is that user behavior sequences, when analyzed in the frequency domain using Fourier Transform, reveal distinct patterns corresponding to these different types of intentions. Low-frequency components represent stable, overarching preferences, while high-frequency components represent transient, specific interests. By disentangling and processing these components, FICLRec aims to overcome the limitation of existing models that often favor low-frequency intents. Furthermore, to address data sparsity and learn more robust representations for these intents, the model employs frequency-enhanced contrastive learning which pulls positive samples closer and pushes negative samples apart in the embedding space, adapted for both high-frequency and low-frequency intent representations.
4.2. Overall Framework
The overall architecture of FICLRec is designed to process user interaction sequences, extract frequency-enhanced intent representations, and use these representations for next-item prediction augmented by multi-task learning with intent contrastive learning. The model consists of four main components:
-
Embedding Layer: Converts discrete item IDs into dense vector representations. -
Frequency Redistribution Encoder (FRE): Processes the embedded sequence, disentangling and re-weightinglow-frequencyandhigh-frequencycomponents. -
Intent Contrastive Learning: A self-supervised task that compriseshigh-frequency intent contrastive learningandlow-frequency intent contrastive learning, designed to capture short-term and long-term user preferences respectively. -
Prediction Layer: Uses the learned user representation to predict the next item.The following figure (Figure 2 from the original paper) illustrates the overall framework:

The figure illustrates the data flow, starting with the input user sequence . This sequence first passes through the Embedding Layer. The embedded sequence then enters the Frequency Redistribution Encoder, which includes a Frequency Redistribution Structure, Multi-Head Self-Attention, and Point-wise Feed-Forward Network. The output of this encoder, representing the user's comprehensive behavior, is then used for two tasks: Next-item Prediction and Intent Contrastive Learning. The Intent Contrastive Learning branch consists of High-Frequency Intent Contrastive Learning and Low-Frequency Intent Contrastive Learning modules, which are optimized to refine the intent representations. Finally, the overall loss combines the prediction loss and the intent contrastive loss.
4.3. Embedding layer
In the embedding layer, each unique item in the global item set is mapped to a continuous vector space. This creates an item embedding matrix , where is the total number of items and is the embedding dimension.
For a given user , their historical interaction sequence is , where is the item interacted with at position , and is the maximum sequence length. Each item is converted into its embedding . The sequence of item embeddings for user is thus represented as:
Since mechanisms like self-attention (used later in the encoder) do not inherently preserve the order of items, positional information must be explicitly added to the embeddings. This is done by adding a positional embedding to the item embeddings. The combined embeddings are then processed by Layer Normalization and Dropout for stability and regularization:
Here, represents the positional embedding matrix, LayerNorm normalizes the activations across features for each individual sample, and Dropout randomly sets a fraction of input units to zero during training to prevent overfitting. The output serves as the initial input to the Frequency Redistribution Encoder.
4.4. Frequency Redistribution Encoder
The Frequency Redistribution Encoder (FRE) is the core component responsible for processing the sequence embeddings by leveraging frequency domain information. It aims to capture both low-frequency and high-frequency patterns from user behaviors. The FRE consists of multiple stacked blocks.
Let the input sequence representation for the -th encoder block be . The operation of the -th FRE block can be described as:
For the first block, the input is the output of the embedding layer, i.e., . Each FRE block itself contains three sub-components: a Frequency Redistribution Structure, a Multi-head Self-Attention mechanism, and a Point-wise Feed-Forward Network.
4.4.1. Frequency Redistribution Structure
This structure is where the frequency domain analysis takes place. It applies the Fast Fourier Transform (FFT) to the input sequence, separates it into low-frequency and high-frequency components, and then recombines them adaptively.
Given the input , the FFT operation is applied. The FFT converts the time-domain sequence into a frequency-domain representation. The paper then defines the low-frequency component by taking the first elements of the frequency-domain representation (representing the lowest frequencies) and converting them back to the time-domain using the Inverse Fast Fourier Transform (IFFT):
Here, denotes the FFT operation, denotes the IFFT operation, and is the input sequence. The notation signifies taking the frequency components from index 0 up to , where . The value is a hyperparameter determining the cutoff frequency. This effectively filters out higher frequencies, leaving only the low-frequency content.
Conversely, the high-frequency component is defined by taking the frequency components from to (the remaining higher frequencies) and converting them back to the time-domain using IFFT:
The paper notes that reconstructing the full spectrum from just these two components is generally sufficient and reduces computational complexity.
After obtaining both low-frequency () and high-frequency () representations, the model combines them using a gating mechanism. This mechanism adaptively re-weights the contribution of each component based on their importance.
In this formulation:
W _ { 1 }andW _ { 2 }are learnabledimensionality reduction parameter matrices, transforming and before combination.sigmoidis thesigmoid activation function, which squashes the values between 0 and 1, acting as a gate.- is the
gate vector, determining the weight given to thelow-frequency component. - then determines the weight given to the
high-frequency component. - represents
element-wise multiplication. - is the adaptively recombined representation, which is then passed to the
multi-head self-attentionlayer. This process allows the model to selectively emphasize or de-emphasizelow-frequency(stable patterns) orhigh-frequency(dynamic changes) information as needed.
4.4.2. Multi-head self-attention
Following the Frequency Redistribution Structure, the recombined representation is passed through a multi-head self-attention layer. This layer, inspired by SASRec, allows the model to capture dependencies between items in the sequence, considering their contextual relevance.
Here:
- is the input to the
self-attentionlayer. - , , are learnable
projection matricesfor queries, keys, and values, respectively, for the -th block. - is the embedding dimension, and is a scaling factor to prevent large dot product values from dominating the
softmaxfunction. - The
softmaxfunction normalizes the attention scores intoattention weights. - is the output of the
multi-head self-attentionlayer, representing the context-aware sequence.
4.4.3. Point-wise feed-forward network
After the self-attention layer, a point-wise feed-forward network (PFFN) is applied. This network processes each position independently and identically, capturing non-linear features within the sequence.
In this formula:
- is the output from the
self-attentionlayer. - are learnable weight matrices, and are learnable bias vectors for the -th block.
GELU(Gaussian Error Linear Unit) is anactivation functionthat introduces non-linearity.- is the final output of the -th
FREblock.
4.4.4. Stacking blocks
The entire Frequency Redistribution Encoder consists of multiple such blocks stacked together. Each block incorporates residual connections, Layer Normalization, and Dropout for stable training and improved performance. The structure for stacking these components is as follows:
Here:
- represents the
Frequency Redistribution Structuredescribed in Section 4.4.1. - represents the
Multi-Head Self-Attentionmechanism described in Section 4.4.2. - represents the
Point-wise Feed-Forward Networkdescribed in Section 4.4.3. EachLayerNormis applied after adding the residual connection andDropout, ensuring that the inputs to subsequent layers are normalized. The final output of the lastFREblock, usually the representation of the last item in the sequence, , is used as the user'sintent representationfornext-item predictionandcontrastive learning.
4.5. Prediction layer
The prediction layer is responsible for taking the final user intent representation (denoted as , typically the embedding of the last item from the final encoder block) and computing a probability distribution over all possible items in . This probability distribution indicates the likelihood of the user interacting with each item next.
The prediction is calculated by taking the dot product of the user's intent representation with the transpose of the item embedding matrix . This dot product measures the similarity between the user's intent and each item. The result is then passed through a softmax function to convert these similarities into a probability distribution :
Here, is the transpose of the item embedding matrix, where each column corresponds to an item embedding.
The model is trained to minimize a recommendation loss (cross-entropy loss) between the predicted probability distribution and the true next item . The cross-entropy loss (specifically, binary cross-entropy per item in the sequence prediction context) is formulated as:
In this formula:
- represents the number of items in the sequence being processed for prediction (often , the sequence length, or batch size).
y _ { i }is aone-hot encoded vectorrepresenting the ground truth next item for the -th position in the sequence (or for the -th sample in a batch). It is 1 for the actual next item and 0 for others.- is the predicted probability distribution for the next item. The loss penalizes incorrect predictions and encourages the model to assign high probabilities to the actual next items.
4.6. Multi-task learning
The multi-task learning component in FICLRec is implemented through Intent Contrastive Learning (ICL). This self-supervised learning task is crucial for addressing data sparsity and for learning more robust representations of both high-frequency (short-term) and low-frequency (long-term) user intentions. The goal is to optimize these objectives alongside the primary recommendation loss.
The following figure (Figure 3 from the original paper) details the intent contrastive learning mechanism:
Fig. 3. The details of intent contrastive learning.
The figure illustrates how intent contrastive learning works. From a user's historical sequence, two positive samples, and , are derived. Their intent representations, and , are extracted. For high-frequency intent contrastive learning, these representations are directly contrasted and their FFT (frequency domain) representations are aligned. For low-frequency intent contrastive learning, intent prototypes () are queried using , and these prototypes are then contrasted with the original intent representations. The overall intent contrastive loss is a weighted sum of these high and low-frequency components.
4.6.1. High-frequency intent contrastive learning
This part of the ICL focuses on capturing and distinguishing high-frequency intents, which correspond to users' immediate and dynamic short-term interests. It operates on two positive samples derived from the user's interaction sequence.
-
s _ { i }: Represents a sequence containing the target item . -
s _ { i , s }: Represents another sequence from the user's historical interactions () that shares the same target item . These two sequences are considered positive pairs.Their comprehensive feature representations and are obtained from the
Frequency Redistribution Encoder. Then, their finalintent representations, denoted ash _ { i }andh _ { i , s }, are extracted (typically the last hidden state of the sequences).
The primary contrastive loss used is a variant of InfoNCE loss, denoted as . This loss pulls the representations of positive pairs closer while pushing negative samples farther apart.
Here:
-
represents the batch size.
-
h _ { i }is the intent representation of ananchorsequence. -
h _ { i , s }is the intent representation of apositive samplefor . -
h _ { k }represents the intent representations of other sequences in the batch, which serve asnegative samples. -
denotes a
similarity function(e.g., cosine similarity). -
is a
temperature parameterthat controls the sharpness of thesoftmaxdistribution. A smaller makes the distribution sharper, emphasizing larger similarities more. -
is a masking function that ensures positive samples are excluded from the negative sample set (i.e., if
h _ { k }is a positive sample forh _ { i }, and 1 otherwise), preventing them from being mistakenly treated as negatives.To explicitly enhance
high-frequencyfeatures, an additionalhigh-frequency alignment loss() is introduced. This loss directly minimizes the distance between thehigh-frequency componentsof the positive sample pairs in thefrequency domain. In this formula: -
represents the batch size.
-
X _ { i }andX _ { i , s }are thefrequency-domainrepresentations obtained by applying theFFToperation toh _ { i }andh _ { i , s }, respectively. -
refers to the
high-frequencypart of theFFTresult (from theboundary frequencyonwards) for sequence . -
denotes the
L2norm, measuring the Euclidean distance between thehigh-frequencyfeatures of the positive pair. Minimizing this loss encourages their high-frequency characteristics to be similar.Finally, the total
high-frequency intent contrastive loss() combines theInfoNCEloss (applied bidirectionally) and thehigh-frequency alignment loss: The bidirectionalInfoNCEterms ensure that both serves as an anchor for and vice-versa, making the contrastive learning more robust.
4.6.2. Low-frequency intent contrastive learning
This component focuses on capturing users' long-term preferences and stable interests, which are considered low-frequency representations. Instead of directly contrasting individual sequence embeddings, it introduces intent prototypes as representations of these stable interests.
The intent prototypes are represented by a set of cluster centers , where c _ { k } is the -th cluster center. These prototypes are obtained by applying K-Means clustering to all learned intent representations across the entire dataset. For the two positive samples ( and ), their corresponding intent prototypes ( and ) are retrieved by querying the nearest cluster center:
The query function assigns an intent representation to its closest cluster center. This effectively treats the cluster centers as low-frequency or general intent representations.
A cluster-level center alignment loss () is introduced to ensure that the intent representations of the positive samples () are aligned with their respective intent prototypes ():
This loss minimizes the Euclidean distance between each intent representation and its assigned cluster center, thus encouraging the intent representations to conform to the learned intent prototypes.
Similar to the high-frequency part, an InfoNCE loss is applied to contrast the intent representations with their intent prototypes. This pushes intent representations closer to their correct low-frequency prototypes while pushing them away from incorrect ones.
This combined loss () ensures that intent representations are not only aligned with their prototypes but also discriminative against other prototypes, thereby better capturing users' long-term preferences.
Finally, the complete intent contrastive learning loss () is a weighted sum of the high-frequency and low-frequency intent contrastive losses:
Here, and are hyperparameters that control the relative importance of high-frequency (short-term) and low-frequency (long-term) intent contrastive learning tasks in the overall self-supervised objective.
4.6.3. Train and inference
The overall loss function for FICLRec combines the primary recommendation loss () and the intent contrastive learning loss (). The model is trained to minimize this combined objective:
This multi-task learning approach allows the model to simultaneously learn to predict the next item accurately and to form robust, frequency-aware intent representations, thereby benefiting from the self-supervised signals. During inference, only the prediction layer is used with the learned encoder to generate recommendations.
The training algorithm, Algorithm 1 from the paper, is as follows: Algorithm 1: FICLRec Training Algorithm Input: A sequence recommendation dataset , sequence encoder , hyperparameters (e.g., ), epochs , batch size . Output: Trained encoder
1 Partition the dataset into subsets for training, validation, and testing.; 2 Initialize encoder ; 3 for epoch do 4 // Obtain intent prototypes 5 6 for a mini-batch do 7 for do 8 // Sample a subsequence from with the same target item as 9 10 // Encode positive samples 11 12 // Query intent prototype representations: 13 14 // Multi-task Optimization 15 16 17 Update encoder to minimize ; 18 return
The algorithm outlines the iterative training process. In each epoch, K-Means clustering is performed on the current intent representations from all users to update the intent prototypes . Then, for each mini-batch, positive sequence pairs are sampled, their intent representations and corresponding prototypes are obtained, and the multi-task loss (combining recommendation loss and the weighted intent contrastive loss) is computed and used to update the encoder parameters.
4.7. Complexity analysis
The computational complexity of FICLRec is analyzed by considering its main components.
-
Frequency Redistribution Structure: This involves aFast Fourier Transform (FFT)and anInverse Fast Fourier Transform (IFFT), which have a complexity of where is the sequence length. Additionally, there's a linear layer for the gating mechanism, with complexity (where is the embedding dimension). -
Self-Attention Mechanism: This component has a complexity of due to the dot product computations between queries and keys for all pairs of items in the sequence. -
Point-wise Feed-Forward Layer: This layer has a complexity of as it applies a feed-forward network independently to each position, involving matrix multiplications of dimension .Combining these, the total complexity of
FICLRecper layer is: The following are the results from Table 3 of the original paper, comparing the time complexity ofFICLRecwith several baseline models:
| Model | Backbone network | Time complexity |
| SASRec | Self-Attention + point-wise feed-forward layer | |
| DuoRec | Self-Attention + point-wise feed-forward layer | |
| FEARec | Hybrid Attention + point-wise feed-forward layer | |
| ICLRec | Self-Attention + point-wise feed-forward layer | |
| ICSRec | Self-Attention + point-wise feed-forward layer | |
| FICLRec (ours) | FRE + Self-Attention + point-wise feed-forward layer |
As shown in Table 3, FICLRec has a slightly higher time complexity compared to pure self-attention models like SASRec or ICSRec due to the additional term introduced by the FFT/IFFT operations in its Frequency Redistribution Encoder. However, this additional complexity is often acceptable given the performance gains and the efficiency of FFT algorithms for typical sequence lengths. FEARec also includes an term due to its use of Fourier Transforms.
5. Experimental Setup
5.1. Datasets
The authors conducted extensive experiments on five publicly available real-world datasets to validate the effectiveness of FICLRec. These datasets are commonly used in sequential recommendation research and originate from various domains, providing a comprehensive evaluation.
The following are the results from Table 4 of the original paper:
| Dataset | Beauty | Sports | Toys | Yelp | LastFM |
| #Users | 22,363 | 35,598 | 19,412 | 30,431 | 1,090 |
| #Items | 12,101 | 18,357 | 11,924 | 20,033 | 3,646 |
| #Actions | 198,502 | 296,337 | 167,597 | 316,354 | 52,551 |
| # Avg. Actions/User | 8.8 | 8.3 | 8.6 | 10.4 | 48.2 |
| # Avg. Actions/Item | 16.4 | 16.1 | 14 | 15.8 | 14.4 |
| Sparsity | 99.93% | 99.95% | 99.93% | 99.95% | 98.68% |
A summary of each dataset:
-
Beauty: This dataset contains user reviews for beauty products from Amazon. It has 22,363 users, 12,101 items, and 198,502 interactions. The average number of actions per user is 8.8, and the sparsity is high at 99.93%. This represents a domain with many diverse products and relatively short user interaction histories.
-
Sports: Also from Amazon, this dataset focuses on sports and outdoor items. It includes 35,598 users, 18,357 items, and 296,337 interactions. With an average of 8.3 actions per user and 99.95% sparsity, it's another sparse dataset with typical user behavior patterns.
-
Toys: This Amazon dataset covers toys, board games, and outdoor toys, with 19,412 users, 11,924 items, and 167,597 interactions. It has an average of 8.6 actions per user and 99.93% sparsity, similar in characteristics to Beauty and Sports.
-
Yelp: This dataset is derived from Yelp reviews, covering various businesses. It has 30,431 users, 20,033 items, and 316,354 interactions. The average actions per user is 10.4, and sparsity is 99.95%. This dataset offers a different interaction context (services/places rather than products).
-
LastFM: This dataset contains user listening habits from the Last.fm music platform. It is a smaller dataset with 1,090 users, 3,646 items, and 52,551 interactions. Notably, it has a much higher average number of actions per user (48.2) and lower sparsity (98.68%) compared to the Amazon and Yelp datasets. This dataset is valuable for evaluating performance on denser, longer user sequences, potentially rich in both short-term music tastes and long-term artist preferences.
These datasets were chosen because they represent diverse domains (e-commerce, reviews, music), vary in scale, and importantly, exhibit significant
data sparsity(ranging from 98.68% to 99.95%). This makes them highly suitable for validatingFICLRec's ability to handle sparse data and capture complex user intentions in real-world scenarios.
5.2. Evaluation Metrics
The paper uses two widely adopted ranking metrics to evaluate the performance of the sequential recommendation models: HR@K (Hit Rate) and NDCG@K (Normalized Discounted Cumulative Gain). typically denotes the number of top recommended items considered for evaluation.
-
Hit Rate (HR@K):
- Conceptual Definition:
Hit Rate @ Kmeasures the proportion of users for whom the ground truth next item appears in the top recommended items. It's a binary metric: if the target item is in the top list, it's a "hit" (1); otherwise, it's a "miss" (0). It focuses on recall, indicating how often the system successfully recommends the relevant item within a given ranking cutoff. - Mathematical Formula: $ \mathrm{HR@K} = \frac{\text{Number of hits @ K}}{\text{Total number of interactions}} $ Where a "hit" means the true next item is in the top predicted items .
- Symbol Explanation:
Number of hits @ K: The count of all instances where the actual next item was found among the top recommendations.Total number of interactions: The total number of prediction tasks (i.e., the total number of users' next items to be predicted).
- Conceptual Definition:
-
Normalized Discounted Cumulative Gain (NDCG@K):
- Conceptual Definition:
NDCG@Kevaluates the relevance and ranking quality of the recommended items. It gives higher scores to more relevant items that appear at higher (earlier) positions in the recommendation list. This metric is particularly useful when the relevance of items is graded (though in implicit feedbacksequential recommendation, relevance is often binary, i.e., the next item is relevant, others are not). It reflects both the quality (relevance) and the position of the hits. - Mathematical Formula:
First,
Discounted Cumulative Gain (DCG@K)is calculated: $ \mathrm{DCG@K} = \sum_{i=1}^{K} \frac{2^{\mathrm{rel}_i} - 1}{\log_2(i+1)} $ Then,NDCG@KnormalizesDCG@Kby theIdeal DCG (IDCG@K), which is theDCGof a perfect ranking: $ \mathrm{NDCG@K} = \frac{\mathrm{DCG@K}}{\mathrm{IDCG@K}} $ - Symbol Explanation:
-
: The number of top recommended items considered.
-
: The relevance score of the item at position in the recommended list. In binary relevance (common for implicit feedback), if the item at position is the ground truth next item, and
0otherwise. -
: The discount factor, which reduces the contribution of items at lower ranks.
-
: The maximum possible
DCGvalue for a given user's query, obtained by sorting all relevant items by their relevance in descending order. Forsequential recommendationwith a single ground truth next item, is simply if the item is at rank 1, and 0 otherwise for . More generally, if the actual item is at rank , . If is small and the actual item is not in top , both DCG and IDCG will be zero, leading to an undefined division; usually, this implies .Both
HR@KandNDCG@Kare commonly used to provide a comprehensive evaluation ofsequential recommendationsystems, withHR@Kfocusing on whether the item is found, andNDCG@Kalso considering its position. The paper typically uses .
-
- Conceptual Definition:
5.3. Baselines
The authors compared FICLRec against a wide range of state-of-the-art sequential recommendation models, representing different methodological categories:
Traditional recommendation methods:
GRU4Rec (Hidasi et al., 2015): A pioneeringRNN-based model forsequential recommendationusingGated Recurrent Units. It models user sessions to predict the next item.Caser (Tang & Wang, 2018): Employs horizontal and verticalconvolutional filtersto capturesequential patternsin user interactions, allowing forsession parallelization.SASRec (Kang & McAuley, 2018): ATransformer-based model that appliesself-attentionto user sequences, effectively capturinglong-term dependenciesand significantly improvingsequential recommendationperformance.
Frequency domain methods:
FMLPRec (Zhou et al., 2022): Leverages thefrequency domainviaFourier Transformsto model sequential data, aiming to filter out noise and capture both global and local patterns.BSARec (Li et al., 2024b): Combinestime-domainandfrequency-domainanalysis. It usesadaptive filteringto handle noise in the embedding matrix andFourier transformsto identifyperiodic patterns.FEARec (Ni et al., 2023): Integrateshybrid attentionwithFourier transformsto process user sequences, enhancing the model's capacity to identify and manage periodic patterns by separating concerns betweentimeandfrequencydomains.
Self-supervised methods:
ContrastVAE (Zou et al., 2022): Aself-supervised learningapproach based onVariational Autoencoderswithcontrastive learningobjectives, aiming to learn better item representations.DuoRec (Chen et al., 2022): Employsdual contrastive learningtasks to generate robust item representations, addressingdata sparsityanditem embedding degradation.RTRRec (Li et al., 2024f): Aself-supervised learningmodel forsequential recommendationthat uses contrastive learning to learnlatent sequence representations, alleviatingdata sparsityconcerns.ICLRec (Li et al., 2024d): Introducesintent contrastive learningwithin aself-supervised framework, enhancingitem embedding qualityand addressing thechallenge of item embedding degradation.ELCRec (Li et al., 2024c): Anotherself-supervised modelthat leveragescontrastive learningto improvelatent sequence representation, particularly effective insparse data contextsand in increasingnoise robustness.
Intent learning methods:
-
ICSRec (Li et al., 2024e): Anintent-contrastive sequential recommendationmodel that utilizes a novelcontrastive self-supervised learning objectiveto learn and represent user intentions effectively. -
IDCLRec (Chen et al., 2025): Anintent-driven contrastive learningmodel that further refinesintent learningby enhancing the extraction of user intentions insequential recommendation.These baselines were selected because they represent the current
state-of-the-artin various aspects ofsequential recommendation, includingtraditional sequence modeling,frequency domain processing,self-supervised learningfor robustness, and explicitintent learning. This diverse set allows for a comprehensive evaluation ofFICLRec's performance and its distinct contributions.
5.4. Implementation details
The experiments used an embedding dimension fixed at 64 and a maximum sequence length limited to 50, which are common settings in sequential recommendation. The temperature parameter for InfoNCE loss was set to 1.0. The hyperparameters and (weights for high-frequency and low-frequency intent contrastive losses) were searched from . The boundary frequency (determining the split between low and high frequencies) was selected from . The Adam optimizer (Kingma, 2017) was used with a learning rate of , and a dropout rate of 0.5 was applied. All experiments were conducted on an NVIDIA GeForce RTX 2080 Ti GPU (11GB). The baselines (Caser, GRU4Rec, SASRec, FMLPRec, BSARec, RTRRec, ContrastVAE, DuoRec, ICLRec, ELCRec, ICSRec, IDCLRec) were either implemented by the authors or their results were taken from published papers, ensuring fair comparison.
6. Results & Analysis
6.1. Core Results Analysis
The paper conducted extensive experiments to compare FICLRec with several state-of-the-art baseline models. The primary goal was to demonstrate FICLRec's superior performance in sequential recommendation, particularly its enhanced ability to capture high-frequency intents and its robustness to data sparsity.
The following are the results from Table 5 of the original paper, showing the performance comparison across all five datasets using HR@K and NDCG@K metrics:
| Dataset Metric | Caser | GRU4Rec | SASRec | FMLPRec | BSARec | RTRRec | ContrastVAE | DuoRec | ICLRec | ELCRec | ICSRec | IDCLRec | FICLRec (ours) | Imp. vs. SOTA | ||
| Beauty | HR@5 | 0.0159 | 0.0175 | 0.0338 | 0.0355 | 0.0597 | 0.0705 | 0.0436 | 0.0322 | 0.0460 | 0.0548 | 0.0440 | 0.0498 | 0.0677 | 0.0724 | 2.70% |
| HR@10 | 0.0226 | 0.0287 | 0.0525 | 0.0561 | 0.0881 | 0.0987 | 0.0673 | 0.0548 | 0.0728 | 0.0844 | 0.0650 | 0.0742 | 0.0930 | 0.1007 | 2.03% | |
| HR@20 | 0.0427 | 0.0461 | 0.0800 | 0.0847 | 0.1248 | 0.1345 | 0.1005 | 0.0832 | 0.1090 | 0.1208 | 0.0935 | 0.1048 | 0.1271 | 0.1390 | 3.35% | |
| NDCG@5 | 0.0097 | 0.0105 | 0.0217 | 0.0222 | 0.0369 | 0.0504 | 0.0287 | 0.0195 | 0.0306 | 0.0344 | 0.0288 | 0.0341 | 0.0481 | 0.0516 | 2.38% | |
| NDCG@10 | 0.0132 | 0.0141 | 0.0277 | 0.0288 | 0.0460 | 0.0595 | 0.0364 | 0.0267 | 0.0392 | 0.0439 | 0.0355 | 0.0420 | 0.0562 | 0.0606 | 1.85% | |
| NDCG@20 | 0.0172 | 0.0185 | 0.0346 | 0.0361 | 0.0553 | 0.0685 | 0.0447 | 0.0339 | 0.0483 | 0.0531 | 0.0426 | 0.0497 | 0.0648 | 0.0703 | 2.63% | |
| Sports | HR@5 | 0.0074 | 0.0103 | 0.0185 | 0.0205 | 0.0346 | 0.0408 | 0.0245 | 0.0222 | 0.0231 | 0.0299 | 0.0262 | 0.0265 | 0.0384 | 0.0438 | 7.35% |
| HR@10 | 0.0131 | 0.0182 | 0.0303 | 0.0314 | 0.0525 | 0.0589 | 0.0407 | 0.0360 | 0.0370 | 0.0456 | 0.0400 | 0.0410 | 0.0548 | 0.0623 | 5.77% | |
| HR@20 | 0.0224 | 0.0304 | 0.0453 | 0.0483 | 0.0758 | 0.0839 | 0.0628 | 0.0557 | 0.0562 | 0.0669 | 0.0593 | 0.0634 | 0.0770 | 0.0885 | 5.48% | |
| NDCG@5 | 0.0048 | 0.0064 | 0.0119 | 0.0136 | 0.0198 | 0.0284 | 0.0156 | 0.0144 | 0.0153 | 0.0189 | 0.0178 | 0.0177 | 0.0256 | 0.0300 | 5.63% | |
| NDCG@10 | 0.0066 | 0.0090 | 0.0156 | 0.0171 | 0.0256 | 0.0342 | 0.0209 | 0.0189 | 0.0198 | 0.0240 | 0.0222 | 0.0224 | 0.0318 | 0.0359 | 4.97% | |
| NDCG@20 | 0.0089 | 0.0120 | 0.0194 | 0.0213 | 0.0314 | 0.0405 | 0.0264 | 0.0238 | 0.0246 | 0.0293 | 0.0271 | 0.0280 | 0.0374 | 0.0425 | 4.94% | |
| Toys | HR@5 | 0.0076 | 0.0112 | 0.0194 | 0.0213 | 0.0434 | 0.0472 | 0.0264 | 0.0238 | 0.0246 | 0.0293 | 0.0271 | 0.0280 | 0.0374 | 0.0425 | 4.94% |
| HR@10 | 0.0144 | 0.0202 | 0.0324 | 0.0352 | 0.0614 | 0.0685 | 0.0481 | 0.0413 | 0.0409 | 0.0489 | 0.0465 | 0.0486 | 0.0646 | 0.0679 | 3.14% | |
| HR@20 | 0.0249 | 0.0324 | 0.0434 | 0.0472 | 0.0853 | 0.0978 | 0.0726 | 0.0603 | 0.0595 | 0.0728 | 0.0774 | 0.0828 | 0.1038 | 0.1092 | 2.82% | |
| NDCG@5 | 0.0044 | 0.0068 | 0.0105 | 0.0113 | 0.0294 | 0.0319 | 0.0330 | 0.0241 | 0.0409 | 0.0387 | 0.0395 | 0.0406 | 0.0557 | 0.0591 | 2.43% | |
| NDCG@10 | 0.0066 | 0.0096 | 0.0145 | 0.0156 | 0.0352 | 0.0412 | 0.0409 | 0.0303 | 0.0489 | 0.0476 | 0.0465 | 0.0486 | 0.0646 | 0.0679 | 2.41% | |
| NDCG@20 | 0.0092 | 0.0127 | 0.0194 | 0.0213 | 0.0434 | 0.0472 | 0.0490 | 0.0444 | 0.0563 | 0.0538 | 0.0538 | 0.0565 | 0.0728 | 0.0774 | 2.50% | |
| Yelp | HR@5 | 0.0108 | 0.0140 | 0.0234 | 0.0261 | 0.0461 | 0.0567 | 0.0234 | 0.0227 | 0.0182 | 0.0270 | 0.0148 | 0.0143 | 0.0710 | 0.0826 | 13.85% |
| HR@10 | 0.0186 | 0.0240 | 0.0419 | 0.0426 | 0.0702 | 0.0707 | 0.0398 | 0.0331 | 0.0268 | 0.0445 | 0.0256 | 0.0261 | 0.0826 | 0.0934 | 13.11% | |
| HR@20 | 0.0321 | 0.0424 | 0.0702 | 0.0707 | 0.1069 | 0.1064 | 0.0669 | 0.0565 | 0.0380 | 0.0729 | 0.0391 | 0.0411 | 0.1089 | 0.1235 | 13.41% | |
| NDCG@5 | 0.0091 | 0.0118 | 0.0210 | 0.0225 | 0.0376 | 0.0460 | 0.0195 | 0.0171 | 0.0112 | 0.0227 | 0.0093 | 0.0195 | 0.0564 | 0.0645 | 14.36% | |
| NDCG@10 | 0.0125 | 0.0165 | 0.0290 | 0.0287 | 0.0487 | 0.0558 | 0.0268 | 0.0220 | 0.0170 | 0.0299 | 0.0130 | 0.0256 | 0.0673 | 0.0766 | 13.82% | |
| NDCG@20 | 0.0172 | 0.0226 | 0.0391 | 0.0387 | 0.0622 | 0.0694 | 0.0371 | 0.0319 | 0.0226 | 0.0391 | 0.0195 | 0.0319 | 0.0826 | 0.0932 | 12.83% | |
| LastFM | HR@5 | 0.0193 | 0.0239 | 0.0367 | 0.0376 | 0.0404 | 0.0450 | 0.0376 | 0.0312 | 0.0468 | 0.0394 | 0.0303 | 0.0266 | 0.0459 | 0.0569 | 23.97% |
| HR@10 | 0.0367 | 0.0358 | 0.0560 | 0.0642 | 0.0615 | 0.0615 | 0.0615 | 0.0468 | 0.0679 | 0.0587 | 0.0339 | 0.0400 | 0.0766 | 0.0881 | 14.90% | |
| HR@20 | 0.0550 | 0.0495 | 0.0917 | 0.1064 | 0.0862 | 0.1064 | 0.0862 | 0.0798 | 0.1009 | 0.0881 | 0.0514 | 0.0688 | 0.1171 | 0.1292 | 10.33% | |
| NDCG@5 | 0.0146 | 0.0155 | 0.0260 | 0.0263 | 0.0261 | 0.0338 | 0.0263 | 0.0213 | 0.0346 | 0.0287 | 0.0193 | 0.0185 | 0.0346 | 0.0413 | 19.36% | |
| NDCG@10 | 0.0200 | 0.0194 | 0.0320 | 0.0338 | 0.0339 | 0.0392 | 0.0338 | 0.0253 | 0.0413 | 0.0348 | 0.0246 | 0.0208 | 0.0415 | 0.0503 | 21.20% | |
| NDCG@20 | 0.0245 | 0.0228 | 0.0410 | 0.0402 | 0.0400 | 0.0504 | 0.0400 | 0.0343 | 0.0496 | 0.0422 | 0.0300 | 0.0252 | 0.0501 | 0.0594 | 18.56% | |
Key observations and analysis from Table 5:
-
Overall Superiority of
FICLRec: Across all five datasets and all evaluation metrics (HR@KandNDCG@Kfor K=5, 10, 20),FICLRecconsistently outperforms all baseline models. The "Imp. vs. SOTA" column indicates significant percentage improvements over the best baseline on each specific metric and dataset. This strong performance validates the overall effectiveness of the proposed frequency-enhanced intent contrastive learning framework. -
Significant Gains on Sparse and Dense Datasets:
- On highly sparse datasets like
Yelp(99.95% sparsity),FICLRecachieves particularly large improvements, for example, 13.85% onHR@5and 14.36% onNDCG@5. This suggests thatfrequency contrastive learningis highly effective in mitigating the negative impacts ofdata sparsity, a core claim of the paper. - Even on the relatively denser
LastFMdataset (98.68% sparsity, higher average actions/user),FICLRecdemonstrates remarkable gains, such as 23.97% onHR@5and 19.36% onNDCG@5. This indicates its ability to capture complex user dynamics across different data densities.
- On highly sparse datasets like
-
Advantages over other
Transformer-based andSelf-supervisedModels:FICLRecconsistently outperformsSASRec(a strongTransformerbaseline), highlighting the benefit of incorporatingfrequency domainanalysis andintent contrastive learning.- Compared to other
self-supervisedandintent-learningmodels likeDuoRec,ICLRec,ELCRec,ICSRec, andIDCLRec,FICLRecstill shows significant gains. This suggests that the frequency-enhanced approach to disentangling and learninghigh-frequencyandlow-frequency intentsprovides a distinct advantage over othercontrastive learningstrategies that might not explicitly leverage frequency information. For instance, onBeautydataset,FICLRecimprovesHR@20by 3.35% overIDCLRec(0.1390 vs 0.1345), which is anotherintent-driven contrastive learningmodel.
-
Benefits of Frequency Domain Analysis: Models incorporating the
frequency domainlikeFMLPRecandBSARecgenerally perform better than traditionalRNNorCNNmodels (GRU4Rec,Caser), and sometimes evenSASRec.FICLRecfurther builds upon this, demonstrating that itsFrequency Redistribution Encoderandfrequency contrastive learningeffectively leverage this domain for superior results. -
Robustness to Different Values: The improvements are consistent across different values (
@5,@10,@20), indicating thatFICLRecnot only improves the chances of a hit but also places relevant items higher in the recommendation list.In summary, the experimental results strongly validate
FICLRec's effectiveness. Its architecture, which combinesfrequency domainanalysis withintent contrastive learning, successfully addresses the challenges of capturing diverse user intentions and mitigatingdata sparsity, leading tostate-of-the-artperformance across various real-world datasets.
6.2. Ablation Studies / Parameter Analysis
6.2.1. Ablation Study (RQ2)
The authors conducted an ablation study to evaluate the contribution of each key component of FICLRec. This helps to understand which parts are most critical for the model's overall performance. The study used HR@20 and NDCG@20 metrics on the five datasets.
The following are the results from Table 6 of the original paper, showing the ablation study of FICLRec:
| Model | Dataset | |||||||||
| Beauty | Sports | Toys | Yelp | LastFM | ||||||
| HR | NDCG | HR | NDCG | HR | NDCG | HR | NDCG | HR | NDCG | |
| (A) FICLRec | 0.1390 | 0.0703 | 0.0885 | 0.0425 | 0.1471 | 0.0774 | 0.0826 | 0.0342 | 0.1211 | 0.0594 |
| (B) w/o FR | 0.1315 | 0.0669 | 0.0774 | 0.0380 | 0.1345 | 0.0727 | 0.0732 | 0.0302 | 0.1083 | 0.0504 |
| (C) w/o HFAL | 0.1369 | 0.0694 | 0.0846 | 0.0413 | 0.1458 | 0.0772 | 0.0760 | 0.0316 | 0.1257 | 0.0546 |
| (D) w/o CCAL | 0.1367 | 0.0689 | 0.0836 | 0.0406 | 0.1444 | 0.0761 | 0.0761 | 0.0315 | 0.1138 | 0.0528 |
| (E) ICSRec | 0.1271 | 0.0648 | 0.0770 | 0.0374 | 0.1364 | 0.0728 | 0.0710 | 0.0293 | 0.1018 | 0.0501 |
Analysis of Ablation Study Results:
-
(A)
FICLRec(Full Model): Represents the complete proposed model, serving as the benchmark for comparison. -
(B)
w/o FR(withoutFrequency Redistribution Structure): When theFrequency Redistribution Structureis removed, the performance drops significantly across all datasets. For example, onBeauty, HR@20 drops from 0.1390 to 0.1315, and onSports, it drops from 0.0885 to 0.0774. This indicates that theFrequency Redistribution Structureis highly effective in capturing bothlow-frequencyandhigh-frequencyfeatures, which are crucial for the model's overall performance. Its absence severely impacts the model's ability to differentiate and re-weight different intent components. -
(C)
w/o HFAL(withoutHigh-Frequency Alignment Loss): Removing thehigh-frequency alignment loss() leads to a noticeable drop in performance (e.g., HR@20 onBeautydrops from 0.1390 to 0.1369). While the drop is less dramatic than removingFR, it still confirms the importance of explicitly aligninghigh-frequency componentsof positive samples. This term helps the model learn more distincthigh-frequency intentsand highlights that dynamic short-term interests need specific contrastive alignment. The paper notes that this specific alignment loss reduces the impact ofnoisy featuresinhigh-frequency components. -
(D)
w/o CCAL(withoutCluster-Level Center Alignment Loss): Removing thecluster-level center alignment loss() also results in a performance decrease (e.g., HR@20 onBeautydrops from 0.1390 to 0.1367). This loss ensures thatintent representationsare closely aligned with theirintent prototypes(cluster centers), which representlow-frequencyor general preferences. Its removal indicates that explicitly enforcing this alignment is important for capturing stable, long-term user interests and strengthening the model's overallintent learningcapability. -
(E)
ICSRec: This is a strong baseline that is anintent-contrastive sequential recommendationmodel but does not utilizefrequency domaininformation in the same manner.FICLRec(A) consistently outperformsICSRec(E), demonstrating that the frequency-enhanced approach, especially theFrequency Redistribution Structureand specializedfrequency contrastive learningobjectives, provides significant advantages over otherintent-contrastivemethods.In summary, the
ablation studyconfirms that all proposed components – theFrequency Redistribution Structure(FR), theHigh-Frequency Alignment Loss(HFAL), and theCluster-Level Center Alignment Loss(CCAL) – are essential forFICLRec's superior performance. Each component contributes uniquely to either effectively capturing frequency-aware intentions or robustly learning from sparse data, collectively boosting the model's ability to provide accuratesequential recommendations.
6.2.2. Hyperparameter Study (RQ3)
Impact of ICL loss weight and
The hyperparameters and control the weights of the high-frequency intent contrastive loss () and low-frequency intent contrastive loss (), respectively, in the overall intent contrastive loss (). The authors investigated their impact on model performance.
The following figure (Figure 4 from the original paper) shows the parameter sensitivity of the ICL loss weight:
Fig. 4. Parameter sensitivity of the ICL loss weight.
The figure shows that FICLRec is robust to the choice of and . The model generally performs well when and , indicating that even a small contribution from both high-frequency and low-frequency contrastive learning components is beneficial. The performance does not drastically change with variations in these weights, suggesting that the model's core architecture and the inherent value of frequency information are strong. The optimal values might vary slightly across datasets, but a configuration with relatively low weights for both contrastive losses already yields satisfactory performance, simplifying hyperparameter tuning.
Impact of the intent number
The intent number refers to the number of clusters used in K-Means to derive the intent prototypes for low-frequency intent contrastive learning. This parameter directly influences the granularity of low-frequency intent representation.
The following figure (Figure 5 from the original paper) shows the parameter sensitivity of the intent number:
Fig. 5. Parameter sensitivity of the intent number.
The graph indicates that model performance is sensitive to the choice of . Different datasets have different optimal values. For instance, is typically best for Beauty, for Sports, for Toys, for Yelp, and for LastFM. This suggests that the optimal number of low-frequency intent prototypes varies depending on the diversity and complexity of user behaviors within each dataset. A carefully chosen allows the model to strike a balance between capturing distinct long-term preferences and avoiding overfitting to too many fine-grained prototypes.
Impact of the boundary frequency
The boundary frequency is a critical hyperparameter within the Frequency Redistribution Structure. It determines the cutoff point for separating low-frequency and high-frequency components of the signal after Fourier Transform.
The following figure (Figure 6 from the original paper) shows the parameter sensitivity of the boundary frequency:

The results show that the model's performance is affected by the boundary frequency . Similar to , the optimal varies by dataset: for Beauty, for Sports, for Toys, for Yelp, and for LastFM. This indicates that the ideal division between short-term (high-frequency) and long-term (low-frequency) intents is dataset-dependent. Fine-tuning is crucial for effectively disentangling and processing different temporal patterns in user behavior, thereby maximizing the benefits of the Frequency Redistribution Structure.
6.3. Effectiveness on noisy or sparse data (RQ4)
6.3.1. Robustness to noisy data
To evaluate FICLRec's ability to handle noisy interactions, the authors introduced negative samples (simulated noise) at different rates: 5%, 10%, 15%, and 20% into the training data.
The following figure (Figure 7 from the original paper) illustrates the performance under noisy data:

The figure displays the NDCG@20 (left axis, bar graph) and HR@20 (right axis, line graph) performance of FICLRec and baselines as the noise rate increases. While all models experience a decline in performance with increasing noise, FICLRec consistently outperforms the baselines. Notably, even when the noise rate reaches 15%, FICLRec maintains a higher performance level compared to other models. This demonstrates FICLRec's superior robustness to noisy data, which is attributed to its frequency redistribution structure and contrastive learning objectives. By discerning high-frequency noise from meaningful short-term signals and leveraging robust intent prototypes, FICLRec can learn more accurate representations despite corrupted inputs.
6.3.2. Robustness to sparse data
To assess FICLRec's performance under data sparsity, the authors simulated different levels of sparsity by randomly sampling 75%, 50%, and 25% of the original training data for model training, while keeping the test data unchanged.
The following figure (Figure 8 from the original paper) illustrates the performance under sparse data:

The graph shows NDCG@20 (left axis, bar graph) and HR@20 (right axis, line graph) for different training data percentages. As the percentage of training data decreases (indicating higher sparsity), all models experience a drop in performance. However, FICLRec consistently exhibits more stable performance and maintains a larger margin over the baselines, especially under severe sparsity (e.g., 25% training data). This indicates that the frequency contrastive learning approach effectively mitigates the negative impact of data sparsity. By forcing similar samples closer and dissimilar ones apart, even with limited interactions, the model can learn more discriminative and robust item and intent embeddings. This finding directly supports one of the paper's core claims: that FICLRec reduces the negative impact of data sparsity on model performance.
6.4. Analysis of item embedding quality (RQ5)
The paper evaluates item embedding quality both qualitatively and quantitatively to understand how FICLRec creates richer and more discriminative item representations.
6.4.1. Qualitative analysis
The authors used t-SNE to visualize the item embeddings learned by FICLRec and compared them to those from ICSRec, specifically on the Beauty and Yelp datasets.
The following figure (Figure 9 from the original paper) shows visualizations of item embeddings:

The figure displays t-SNE visualizations of item embeddings. The text indicates that FICLRec produces a more uniform item distribution and clusters long-tail items (items with few interactions) more effectively than ICSRec. This suggests that FICLRec's frequency-enhanced intent contrastive learning helps to enrich the representations of less frequently interacted items. By leveraging both high-frequency (dynamic) and low-frequency (stable, often captured by prototypes) signals, FICLRec can learn more robust embeddings for long-tail items, which are challenging for many recommendation systems. The improved embedding quality for these items can lead to better personalized recommendations, as long-tail items often represent niche interests.
The following figure (Figure 10 from the original paper) shows further visualization of item embeddings:

Figure 10 further reinforces the observation that FICLRec effectively improves the quality of item representation and strengthens the recommendation of long-tail items. This implies that the model's ability to capture distinct high-frequency and low-frequency intents, combined with its contrastive learning objectives, leads to a more organized and meaningful feature space where similar items (including long-tail ones) are grouped together.
6.4.2. Quantitative analysis
The paper states that FICLRec performs well in regions where user interactions are frequent and diverse. While the paper mentions quantitative analysis, it does not provide a specific table or figure for this aspect in the provided text. However, the overall performance improvements in Table 5 across various metrics indirectly serve as a quantitative validation of better item embedding quality, as higher quality embeddings generally lead to better recommendation performance. The implicit argument is that improved HR and NDCG values, particularly on diverse datasets, are a direct consequence of the model's ability to learn more effective item and intent embeddings.
6.5. Case study (RQ6)
To further illustrate how FICLRec focuses on different types of features, the authors conducted a case study by visualizing the average attention weights on the Yelp dataset and comparing it with ICSRec. This addresses RQ6: "Does FICLRec focus more effectively on high-frequency features than ICSRec?"
The following figure (Figure 11 from the original paper) shows visualizations of average attention weights (dataset: Yelp):
Fig. 11. Visualizations of average attention weights (dataset:Yelp).
The figure displays a heatmap of average attention weights from the self-attention layer. The colors indicate the strength of attention. Comparing FICLRec with ICSRec, the visualization shows that FICLRec exhibits stronger attention weights on the more recent items in the sequence (i.e., towards the right end of the sequence). This suggests that FICLRec, due to its Frequency Redistribution Structure and high-frequency intent contrastive learning, effectively learns to focus on local high-frequency dependencies in the current sequence. This enhanced focus on recent interactions (which often drive short-term intents) allows FICLRec to better capture dynamic user behaviors and make more accurate next-item predictions. ICSRec, while also using attention, might not have the same specialized mechanism to prioritize these high-frequency signals.
6.6. Long-tail and short sequence analysis
The paper further analyzes FICLRec's performance in specific challenging scenarios: long-tail items and short sequences.
The following are the results from Table 7 of the original paper, showing user distributions across different interaction length intervals:
| Dataset | [ < 10](prop.) | [10, 20](prop.) | [20, 30](prop.) | [30, 40](prop.) | [ ≥ 40](prop.) |
| Beauty | 17353(77.60%) | 3152(14.10%) | 1065(4.76%) | 367(1.64%) | 426(1.90%) |
| Sports | 28478(79.99%) | 4555(12.80%) | 1480(4.16%) | 395(1.11%) | 690(1.94%) |
| Toys | 16345(84.20%) | 2320(11.95%) | 476(2.45%) | 130(0.67%) | 141(0.73%) |
| Yelp | 26550(87.25%) | 2948(9.69%) | 692(2.27%) | 135(0.44%) | 106(0.35%) |
| LastFM | 1090(100%) | 0(0%) | 0(0%) | 0(0%) | 0(0%) |
Table 7 shows the distribution of users across different sequence length categories. For most datasets (Beauty, Sports, Toys, Yelp), a very large proportion of users (77-87%) have short sequences (length < 10). LastFM is an outlier where all users have sequences of length < 10 (though its average actions per user is higher in Table 4, implying many users have multiple short sequences, or this table's definition of sequence length cutoff is different). This highlights the prevalence of short sequences in real-world data, a challenging scenario for sequential recommendation.
The following are the results from Table 8 of the original paper, showing item popularity distributions:
| Dataset | [ < 10](prop.) | [10, 20](prop.) | [20, 30](prop.) | [30, 40](prop.) | [ ≥ 40](prop.) |
| Beauty | 17240(77.10%) | 3875(17.33%) | 729(3.25%) | 254(1.14%) | 265(1.18%) |
| Sports | 27740(77.92%) | 6450(18.12%) | 946(2.66%) | 271(0.76%) | 191(0.54%) |
| Toys | 15224(78.43%) | 3214(16.55%) | 552(2.84%) | 217(1.12%) | 205(1.06%) |
| Yelp | 20977(68.93%) | 6587(21.65%) | 1558(5.12%) | 631(2.07%) | 678(2.23%) |
| LastFM | 302(27.71%) | 263(24.13%) | 111(10.17%) | 78(7.16%) | 336(30.83%) |
Table 8 shows the distribution of items based on their popularity (number of interactions). A large majority of items (68-84% for most datasets) fall into the long-tail category ( interactions), indicating the prevalence of long-tail items which are difficult to recommend due to insufficient data. LastFM again stands out with a more balanced distribution, including a significant portion of very popular items ( interactions).
The following figure (Figure 12 from the original paper) shows a subgroup sequence analysis on the Sports dataset:
Fig. 12. Subgroup sequences analysis on Sports dataset.
Figure 12 presents HR@20 performance for FICLRec and baseline methods on the Sports dataset, broken down by categories of user sequence length (derived from Table 7) and item popularity (derived from Table 8).
Analysis of Figure 12:
-
Performance on
Short Sequences:FICLRecdemonstrates strong performance inshort sequencecategories (e.g., and[10, 20]lengths). This is crucial becauseshort sequencesrepresent the majority of user behaviors in many datasets (as seen in Table 7). The ability ofFICLRecto capturehigh-frequency intents(short-term signals) is particularly beneficial in these scenarios where long-term context is limited. -
Performance on
Long-Tail Items: The figure likely showsFICLRecperforming well forlong-tail items(e.g., in the and[10, 20]popularity categories). This aligns with the qualitative analysis ofitem embedding quality(Figure 9, 10), suggesting thatFICLRec'sfrequency-enhanced contrastive learningeffectively learns robust representations for items with limited interactions. This is a significant advantage, as recommendinglong-tail itemsimproves content diversity and caters to niche user interests. -
Performance on
Popular Items: The paper explicitly states that "for popular items (i.e.,[30, 40]and ),FICLRecdid not outperform other models." This is an important limitation to note. It implies that whileFICLRecexcels at capturing dynamic and niche preferences, its specializedfrequency-enhancedapproach might not offer the same comparative advantage when recommending very popular items, where simpler models might already achieve high performance due to abundant interaction data.Overall, the
long-tailandshort sequence analysishighlightsFICLRec's strength in addressing challenging real-world scenarios, particularly where data is sparse and user intentions are dynamic. While it performs exceptionally well on the majority of users and items, there might be room for improvement or a different optimization focus for highly popular items.
6.7. Training efficiency
The authors also evaluated the training efficiency of FICLRec by comparing its Flops (floating point operations), number of epochs, time per epoch, total training time, and number of parameters with several baselines.
The following are the results from Table 9 of the original paper, showing the training efficiency:
| Methods | Flops | Sports | Toys | ||||||
| epoch | s/epoch | total time | # params | epoch | s/epoch | total time | # params | ||
| SASRec | 1.272G | 88 | 27 | 22.0 | 1,278,208 | 246 | 9 | 36.9 | 866,496 |
| DuoRec | 1.272G | 103 | 70 | 46.4 | 1,278,208 | 151 | 25 | 62.9 | 866,496 |
| FEARec | 1.258G | 114 | 323 | 613.7 | 1,278,208 | 152 | 223 | 564.9 | 866,496 |
| ICLRec | 1.258G | 300 | 35 | 175.0 | 1,278,272 | 300 | 17 | 85.0 | 866,560 |
| ICSRec | 1.258G | 143 | 164 | 107.3 | 1,278,272 | 174 | 89 | 258.1 | 866,560 |
| FICLRec (ours) | 1.468G | 123 | 188 | 116.9 | 1,295,424 | 157 | 93 | 243.4 | 883,712 |
Analysis of Training Efficiency:
Flops(Floating Point Operations):FICLRechas slightly higherFlops(1.468G) compared to most baselines (e.g.,SASRecandDuoRecat 1.272G,FEARec,ICLRec,ICSRecat 1.258G). This is expected, asFICLRecintegrates the additionalFFT/IFFTcomputations in itsFrequency Redistribution Structure, which adds an term to the complexity.Number of Parameters:FICLRecalso has a slightly largernumber of parameters(e.g., 1,295,424 for Sports) compared to baselines likeSASRec(1,278,208). This is due to the additional learnable weight matrices () in theFrequency Redistribution Structure.Time/Epoch:FICLRec'stime per epoch(188 s/epoch for Sports, 93 s/epoch for Toys) is generally higher than simplerTransformer-based models likeSASRec(27 s/epoch, 9 s/epoch) andICLRec(35 s/epoch, 17 s/epoch). This is a direct consequence of its increasedcomputational complexityper epoch from thefrequency domainoperations. However, it is comparable to or even better thanFEARec(323 s/epoch, 223 s/epoch) and not drastically higher thanICSRec(164 s/epoch, 89 s/epoch) onToys.Total Time: Despite the highertime per epoch,FICLRecoften requires fewerepochsto converge than some baselines (e.g.,ICLRecandICSRecoften need 300 and 143/174 epochs respectively, whileFICLRecneeds 123/157 epochs). This sometimes results in a competitivetotal training time. For example, onSports,FICLRec's total time (116.9 minutes) is less thanICLRec(175.0 minutes) andFEARec(613.7 minutes). OnToys, its total time (243.4 minutes) is less thanFEARecandICSRec.
Conclusion on Efficiency:
The analysis indicates that FICLRec introduces a reasonable increase in computational complexity (both Flops and parameters) and time per epoch compared to some baselines. However, this trade-off is often justified by its significantly superior recommendation performance. The fact that it can achieve strong results in a competitive total training time, sometimes even faster than other complex contrastive learning or frequency domain models, suggests that the added complexity is worthwhile. The model's efficiency remains within an acceptable range, especially when considering the substantial performance advantages it offers in capturing nuanced user intentions and handling data sparsity.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper introduced FICLRec, a novel Frequency Enhanced Intent Contrastive Learning Recommendation model designed to address key challenges in sequential recommendation: the insufficient capture of high-frequency intents and the pervasive issue of data sparsity. FICLRec innovatively leverages frequency information from user interaction sequences, using a Frequency Redistribution Encoder to disentangle low-frequency (long-term, stable) and high-frequency (short-term, dynamic) user intentions. It then employs a multi-task learning framework that includes both high-frequency intent contrastive learning (with high-frequency alignment loss) and low-frequency intent contrastive learning (with cluster-level center alignment loss) to learn robust, frequency-aware intent representations. Extensive experiments on five real-world datasets demonstrated that FICLRec consistently outperforms a wide array of state-of-the-art baselines, achieving significant improvements in HR@K and NDCG@K metrics. The ablation studies confirmed the effectiveness of each proposed component, and hyperparameter studies revealed the model's robustness and sensitivity to specific configurations. Furthermore, FICLRec proved to be highly robust to noisy and sparse data conditions, and its item embedding quality was shown to be superior, particularly for long-tail items and short sequences. While incurring a slight increase in computational complexity, the substantial performance gains validate the model's efficacy.
7.2. Limitations & Future Work
The authors acknowledge several limitations and propose directions for future research:
- Performance on Popular Items: While
FICLRecexcels at recommendinglong-tail itemsand handlingshort sequences, it "did not always achieve the desired performance" for verypopular items. This suggests that its specialized focus on disentangling intentions and addressing sparsity might not provide as much incremental benefit for items with abundant interaction data. - Computational Overhead of
K-Means: The current approach forlow-frequency intent contrastive learninginvolvesK-Means clusteringto generateintent prototypesin each training epoch (Algorithm 1, line 5). This can be computationally intensive, especially for large datasets or a high number of clusters (). - Dynamic Nature of Intent Prototypes: The
intent prototypesare re-generated viaK-Meanson all user representations in each epoch. While this allows for adaptation, a moredynamicorincremental clusteringapproach could potentially be more efficient and responsive to subtle shifts inlong-term preferences. - Optimizing Computational Complexity: Future work could focus on designing more efficient
frequency domainprocessing techniques orcontrastive learningstrategies to reduce thetraining overheadwithout compromising performance. - Adaptive Frequency Band Separation: The
boundary frequencyis a static hyperparameter. Future research could exploreadaptive mechanismsto dynamically determine the optimal frequency split for different users or contexts.
7.3. Personal Insights & Critique
FICLRec presents a compelling and well-executed approach to sequential recommendation, particularly for its innovative integration of frequency domain analysis with intent contrastive learning.
Insights:
- Power of Frequency Domain: The paper strongly highlights the untapped potential of
frequency domainanalysis inrecommender systems. Decomposing user behavior intolow-frequency(stable, long-term) andhigh-frequency(dynamic, short-term) components provides a natural and intuitive way to model the duality of user intentions. This concept could be broadly applicable to othersequential modelingtasks beyondrecommendation, such asuser behavior predictionin other domains (e.g., health, finance). - Targeted
Contrastive Learning: The idea of applying differentcontrastive learningobjectives tailored to the specific characteristics ofhigh-frequencyandlow-frequencyintents is highly effective. Rather than a genericcontrastive loss,FICLRecshows that specialized alignment (e.g.,HFALfor short-term,CCALfor long-term prototypes) significantly enhances the model's ability to discriminate between and learn different aspects of user preference. - Robustness to Real-World Challenges: The demonstrated robustness to
noisyandsparse datais a critical practical advantage. Real-worldrecommender systemsare inherently messy, and a model that can perform well under such conditions is highly valuable. Thefrequency domainseems to offer a level of signal robustness that helpscontrastive learningshine even with limited or corrupted inputs. - Addressing the "Cold Start" and
Long-TailProblem: By improvingitem embedding qualityforlong-tail itemsand performing well onshort sequences,FICLRecmakes a significant step towards alleviating thecold startproblem for new items and users, which is a persistent challenge inrecommendation.
Critique / Areas for Improvement:
-
K-MeansComputational Cost: TheK-Meansstep in every epoch, while effective, can be acomputational bottleneck, especially with growing datasets and embedding dimensions. Exploring moreonlineorstreaming clusteringmethods, or using a fixed set ofprototypesupdated via amomentum encoder(similar toMoCoin computer vision) could be more scalable. -
Generalizability of
Boundary Frequency: Theboundary frequencyis dataset-dependent. Manually tuning this parameter for every new dataset can be cumbersome. Anadaptive mechanismthat learns this split (e.g., using an attention-like mechanism over frequency bins or a trainable filter) could enhance the model's automation and generalizability. -
Interpretation of Frequency Components: While the paper intuitively links
low-frequencytolong-termandhigh-frequencytoshort-term, a deeper theoretical or empirical analysis of what specific types of "patterns" or "intents" are actually captured by different frequency bands would be beneficial for further understanding and model design. For instance, canmid-range frequenciesrepresent recurring seasonal interests? -
Trade-off with Popular Items: The observation that
FICLRecdoes not outperform baselines forpopular itemsis interesting. It suggests a potentialtrade-offwhere optimizing forlong-tailandshort-termoften leads to less emphasis on the very dominantpopular items. Future work could investigatemulti-objective optimizationorhybrid approachesthat maintainlong-tailperformance while still excelling atpopular itemrecommendations. -
Dynamic Sequence Lengths: The maximum sequence length is fixed. While common, actual user sequences vary greatly. Exploring
Transformer-variants that handle variable sequence lengths more naturally orpadding strategiesthat minimize information loss could be beneficial.In conclusion,
FICLRecoffers a powerful and principled way to integrate temporal dynamics with user intentions through the lens of thefrequency domain. Its contributions are significant for advancingsequential recommendationand open up exciting avenues for future research in understanding and modeling complex user behaviors.
Similar papers
Recommended via semantic vector search.