菌群异常是什么意思| 12月1日是什么意思| 章子怡是什么脸型| 益生菌什么时候吃最好| 属羊的本命佛是什么佛| 酒后吐吃什么可以缓解| lancome是什么牌子的| 胳膊麻是什么原因| 园五行属什么| 返祖现象什么意思| 暖气是什么症状| 平顶山为什么叫平顶山| 肺结节吃什么药| rt表示什么意思| 什么的叹气| 黑枸杞对男性性功能有什么帮助| 减肥最快的运动是什么运动| 脐带血能治疗什么病| 为什么会突然不爱了| 脂肪肝是什么意思啊| 什么时候减肥效果最快最好| 什么叫同型半胱氨酸| 不稀罕是什么意思| 呼吸内镜检查什么| 干戈是什么意思| 吃完杏不能吃什么| 肺结节手术后吃什么好| 排骨和什么一起炖好吃| 梦里梦到蛇有什么预兆| 耳鸣吃什么| 眼袋是什么原因造成的| 日本为什么要偷袭珍珠港| 肝炎是什么| 罗宾尼手表什么档次| 有鸟飞进屋是什么预兆| 性格什么意思| 吕布属什么生肖| 胃疼吃什么食物最养胃| 梦见去扫墓是什么预兆| 检查胃应该挂什么科| 老想放屁是什么原因| 1978年属什么的| 暗经是什么意思| 人什么地灵| 水牛背满月脸是什么病| 80岁属什么生肖| 大便干燥拉不出来是什么原因| 滨海新区有什么好玩的地方| 9.20号是什么星座| 高危妊娠监督什么意思| 杀鸡取卵是什么生肖| 云南白药的保险子是起什么作用的| 儿童发育过早应该挂什么科| 今年77岁属什么生肖| 鼻子大的男人说明什么| 以备不时之需什么意思| 长期喝蜂蜜有什么好处| o型血为什么叫贵族血| 马到成功是什么生肖| 臭虫的天敌是什么| 赤是什么颜色| 不典型鳞状细胞是什么意思| 什么水果含维生素d| 萝莉控是什么意思| 女生月经迟迟不来是什么原因| 多囊是什么症状| choker是什么意思| 什么是关税| 女人下身干燥无水是什么原因| negative什么意思| 梦到乌龟是什么意思| 鼓上蚤是什么意思| 前卫是什么意思| 什么东西能缓解孕吐| 头痛应该挂什么科| 喝什么酒容易醉| 带翅膀的黑蚂蚁是什么| 新疆人信仰什么教| 动脉抽血是做什么检查| 路人皆知的上一句歇后语是什么| 奢靡是什么意思| 尾盘放量拉升意味着什么| 抗体阳性什么意思| 外痔疮有什么症状| 7月26日什么星座| 梦到别人怀孕是什么意思| 脉数是什么意思| imp什么意思| 上火吃什么可以降火| 什么的雨| 多走路有什么好处| 被迫是什么意思| 脸一边大一边小是什么原因| lalpina是什么牌子| 霉菌性阴道炎用什么药效果好| 头孢和阿莫西林有什么区别| 王京读什么| 王加几念什么| 陈醋泡花生米有什么功效| 健忘是什么意思| 咽喉疼吃什么药| 吃什么排黑色素最强| 血脂是什么意思| 心绞痛是什么原因| 长痘是什么原因| 浸猪笼是什么意思| 万箭穿心代表什么生肖| ab型血可以接受什么血型| 比翼双飞是什么意思| 石斛与什么搭配最好| 什么是夫妻宫| 垂头丧气是什么意思| 男人吃什么药时间长| 征兵初检检查什么| 什么的足球| 鼻子痒是什么原因| 杀生电影讲的什么意思| 好无奈是什么意思| 肾病钾高吃什么食物好| 女生月经迟迟不来是什么原因| bf是什么| 缺钾是什么原因造成的| 大拇指指甲凹凸不平是什么原因| 硫酸镁是什么| 一九六八年属什么生肖| 结肠多发息肉是什么意思| 紫苏有什么功效与作用| 胆囊炎吃什么水果好| 湿疹吃什么中药| 月经推后是什么原因引起| 左眼一直跳有什么预兆| 6月是什么星座| 球代表什么生肖| 熬夜吃什么水果好| 鱼胶是鱼的什么部位| z是什么火车| 直肠炎用什么药效果最好| 苯海拉明是什么药| 口腔溃疡吃什么药好| 什么应什么合| 穆字五行属什么| 血红蛋白升高说明什么| 西瓜霜是什么做的| 石楠花是什么味道| 咳嗽发烧吃什么药| 扎西德勒是什么意思| 内热是什么原因引起的| 什么茶女人长期喝最好| 佩戴狼牙有什么好处| 北京豆汁什么味道| 国际章是什么意思| kpa什么意思| 碧螺春是什么茶| 什么的回答| 余情未了什么意思| 为什么第一次没有出血| 九个月的宝宝吃什么辅食食谱| 身份证号码最后一位代表什么| 幽门梗阻是什么意思| 血脂高是什么原因引起| 羊眼圈是什么| 甲状腺有什么反应| 胎监什么时候开始做| 孕妇应该多吃什么水果| 梅花什么颜色| 风湿是什么原因造成的| 垂线是什么| 头疼头晕是什么原因| 北京为什么这么热| 血脂看什么指标| 一什么枝条| 吹空调头疼吃什么药| 做梦梦到猪是什么意思| 打饱嗝是什么病的前兆| 灏读什么| 免疫抑制是什么意思| 外阴瘙痒用什么药膏好| 肝肾挂什么科| 口食读什么| 肝火大吃什么药| 窘迫什么意思| 更年期是什么意思| 转氨酶偏低是什么原因| 尿潜血挂什么科| 补血吃什么药最快最好| 小康生活的标准是什么| 98年属虎的是什么命| 按摩有什么好处和坏处| 汗脚是什么原因引起的| 网球肘用什么方法能彻底治好呢| 两点是什么时辰| 休息是什么意思| 碳酸盐质玉是什么玉| 胳膊肘发黑是什么原因| 香鱼又叫什么鱼| 官杀旺是什么意思| inr医学上是什么意思| 早上四点是什么时辰| 北极熊代表什么生肖| 尿酸高去医院挂什么科| 红细胞平均体积偏低是什么意思| 查肾功能挂什么科| 肺与什么相表里| 世界上最贵的狗是什么| 肩周炎吃什么药效果最好| 月球是地球的什么星| 凤尾鱼为什么突然就死| 长脸型适合什么样的发型| 瘿瘤是什么病| 掉头发是缺什么维生素| 脚腕肿是什么原因| 是什么星座| 6.5是什么星座| 陈宝国的儿子叫什么| 炖羊肉汤放什么调料| ox什么意思| 6月12号是什么星座| 玻璃用什么材料做的| 晚上2点是什么时辰| 喝菊花茶有什么好处| 梦见死人复活是什么意思| 综合用地是什么性质| 生理期肚子疼吃什么药| 孕妇为什么不能参加婚礼| 容颜是什么意思| 右腹疼是什么原因| 脖子上长癣是什么原因| 3月5号是什么星座| 夜晚的星星像什么| 白瓜是什么瓜| 把碗打碎了有什么征兆| 梦到男朋友出轨了预示什么意思| 白介素2是治疗什么病的| 西多士是什么| 十二指肠霜斑样溃疡是什么意思| 冬菜是什么菜| 胃嗳气是什么原因| 蒺藜是什么意思| 小孩喉咙发炎吃什么药好| 胎膜早破是什么症状| 上分是什么意思| 晚霞是什么颜色的| 草口耳是什么字| 胳膊疼是什么病的前兆| 撕漫男是什么意思| 关节疼挂什么科| 喝牛奶胀气是什么原因| 吃丝瓜有什么功效和作用| 6月6是什么节日| 白细胞计数偏低是什么原因| white是什么意思颜色| 什么车可以闯红灯| 89年的属什么| 梦见打老公是什么意思| 关东煮为什么叫关东煮| 伊人什么意思| 老打嗝是什么病的前兆| 50岁属什么| 心梗做什么手术| 盆腔炎有什么症状| 赭石色是什么颜色| 什么是酸性食物| 高血压可以喝什么饮料| 鱼油什么牌子好| 百度

嫩模为农机站台 商家为吸引人气利用美女作秀

Ahmet Melih ?nce1, Ay?e Elif Canbilen1, Halim Yanikomeroglu7 1Konya Technical University, Konya, 42250, Türkiye
Emails: {e228221001009, aecanbilen}@ktun.edu.tr
7Carleton University, Ottawa, ON K1S 5B6, Canada
Email: halim@sce.carleton.ca
Abstract
百度 今天,成都兴城足球俱乐部成立发布会在川投酒店举行,成都市体育局局长谭学军,成都市兴城集团董事长任志能、成都市国资委副主任冯庆、集团总经理张俊涛、成都市足协主席辜建明、成都德瑞足球培训中心负责人及球队管理层、教练组、全体队员出席本次发布会。

Sixth-generation (6G) networks are designed to meet the hyper-reliable and low-latency communication (HRLLC) requirements of safety-critical applications such as autonomous driving. Integrating non-terrestrial networks (NTN) into the 6G infrastructure brings redundancy to the network, ensuring continuity of communications even under extreme conditions. In particular, high-altitude platform stations (HAPS) stand out for their wide coverage and low latency advantages, supporting communication reliability and enhancing information freshness, especially in rural areas and regions with infrastructure constraints. In this paper, we present reinforcement learning-based approaches using deep deterministic policy gradient (DDPG) to dynamically optimize the age-of-information (AoI) in HAPS-enabled vehicle-to-everything (V2X) networks. The proposed method improves information freshness and overall network reliability by enabling independent learning without centralized coordination. The findings reveal the potential of HAPS-supported solutions, combined with DDPG-based learning, for efficient AoI-aware resource allocation in platoon-based autonomous vehicle systems.

Index Terms:
HAPS, V2X, AoI, Multi-Agent Reinforcement Learning, 6G.

I Introduction

Intelligent transportation and autonomous driving systems have emerged as key components of modern wireless communication research [1]. The increasing need for hyper-reliable, low-latency communication (HRLLC) in vehicular networks requires the development of advanced communication frameworks that ensure efficient data exchange between vehicles, infrastructure, and cloud/edge nodes. With the advent of sixth-generation (6G) networks, these systems are expected to support massive connectivity, ultra-low latency, and high data rates[2]. However, several challenges remain, particularly in environments with limited or unreliable terrestrial infrastructure, such as remote areas, disaster-stricken zones, and ultradense urban environments [3]. For instance, while platoon-based networks require robust intra- and inter-platoon communications to maintain synchronized movement and dynamic adaptation to road conditions, keeping data fresh in such a system is challenging due to varying network conditions, mobility patterns, and channel uncertainties.

In this regard, a critical metric in vehicular network research is Age of Information (AoI), which quantifies the timeliness or “freshness” of data received by a target node [4]. Unlike traditional network performance metrics such as throughput and latency, AoI provides direct insight into how current the received information is, making it particularly relevant for applications such as autonomous vehicle control, collision avoidance, and real-time traffic management[5]. Ensuring low AoI is essential for maintaining situational awareness in highly dynamic vehicular environments, where even minor delays in information updates can lead to significant safety risks.

Integrating HAPS into vehicular communication networks offers a promising solution to overcome coverage limitations and challenges related to AoI[6]. Operating at altitudes of approximately 20?km, HAPS provides wide-area connectivity, strong line-of-sight (LoS) links, and low-latency communication, effectively complementing terrestrial and satellite-based networks[7, 1]. By acting as aerial base stations or relays, HAPS can enhance the reliability and freshness of data in vehicle-to-everything (V2X) networks, particularly in infrastructure-constrained areas [8].

To further improve resource allocation and minimize AoI in HAPS-assisted V2X networks, deep reinforcement learning (DRL) techniques have shown remarkable potential. DRL enables autonomous and dynamic decision-making, allowing vehicular agents to optimize real-time communication strategies [9]. Among various DRL approaches, deep deterministic policy gradient (DDPG) and its multi-agent extension, multi-agent DDPG (MADDPG), have demonstrated effectiveness in handling complex high-dimensional control problems [10]. While DDPG provides a straightforward, independent learning mechanism, its limited adaptation to external interference affects its effectiveness. Fully decentralized MADDPG (FD-MADDPG) on the other hand, offers a more scalable and adaptable solution, allowing multiple agents to learn in parallel without centralized dependency. The decentralized framework, combined with HAPS integration, positions FD-MADDPG as a promising approach for optimizing AoI-aware communication in next-generation vehicular networks.

This study focuses on optimizing AoI-aware resource allocation in HAPS-enabled V2X networks. In particular, we propose a DRL-based framework using DDPG to optimize AoI, improving communication efficiency and autonomous decision-making. By integrating platoon-based vehicular coordination, we develop a resource allocation model that enhances intra- and inter-platoon data exchange, ensuring low AoI and stable connectivity. Our model leverages HAPS as an aerial relay to extend network coverage, enhance reliability, and provide seamless connectivity in infrastructure-limited scenarios. The provided simulation results validate the effectiveness of our approach, demonstrating significant improvements in AoI reduction and network reliability.

The rest of this paper is structured as follows. Section?II presents the system model and problem formulation, detailing the role of HAPS in the V2X framework. Section?III describes the DRL-based resource allocation methods, while Section?IV provides the simulation results. Finally, Section?V concludes the study and outlines future research directions.

All symbols used throughout this paper are summarized in Table I and they are also defined in first appearance for clarity and consistency.

Refer to caption
Figure 1: The considered HAPS-V2X system model.

II System Model and Problem Formulation

The HAPS-enabled V2X architecture considered in this study is represented in Fig.?1 at the top of the next page. This vehicular network consists of multiple autonomous vehicle platoons, each led by a platoon leader (PL) responsible for both intra- and inter-platoon communications. The system supports three primary communication modes, namely vehicle-to-infrastructure (V2I), vehicle-to-vehicle (V2V), and vehicle-to-HAPS (V2H). In V2I communication, the PL connects to roadside units (RSU) or cellular base stations to exchange critical data [11]. V2V communication facilitates direct short-range data sharing within a platoon, ensuring coordinated movement and operational efficiency. Lastly, V2H communication leverages HAPS, which operates at approximately 20 km altitude, to extend coverage beyond terrestrial infrastructure and enhance network reliability, particularly in remote or infrastructure-limited areas, by providing real-time updates and connectivity resilience.

II-A Channel Models

The network operates over a set of K??Kitalic_K orthogonal sub-channels, assuming that orthogonal frequency division multiplexing (OFDM) is utilized. The channel gains for the j??jitalic_jth PL communicating with infrastructure, other vehicles in the platoon, i.e., platoon members, and HAPS through the sub-channel k??={1,,K}????1??k\in\mathcal{K}=\{1,\dots,K\}italic_k ∈ caligraphic_K = { 1 , … , italic_K } at a time slot t??titalic_t are denoted as hj,It?[k]superscriptsubscript???????delimited-[]??h_{j,I}^{t}[k]italic_h start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ], hj,vt?[k]superscriptsubscript???????delimited-[]??h_{j,v}^{t}[k]italic_h start_POSTSUBSCRIPT italic_j , italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ], and hj,Ht?[k]superscriptsubscript???????delimited-[]??h_{j,H}^{t}[k]italic_h start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ], respectively.

Given the strong likelihood of a LoS link to HAPS, the V2H channel following a Rician fading model can be expressed as

hj,Ht?[k]=10?????/20?(pL?ajt+pN?hjt?[k]),subscriptsuperscript???????delimited-[]??superscript10???20subscript????subscriptsuperscript??????subscript????subscriptsuperscript?????delimited-[]??h^{t}_{j,H}[k]=10^{-\mathcal{PL}/20}\left(\sqrt{p_{L}}a^{t}_{j}+\sqrt{p_{N}}h^% {t}_{j}[k]\right),italic_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT [ italic_k ] = 10 start_POSTSUPERSCRIPT - caligraphic_P caligraphic_L / 20 end_POSTSUPERSCRIPT ( square-root start_ARG italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT end_ARG italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + square-root start_ARG italic_p start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_ARG italic_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ italic_k ] ) , (1)

where ???????\mathcal{PL}caligraphic_P caligraphic_L denotes path loss defined by considering the scintillation loss caused by rapid fluctuations of the received signal, the attenuation stemming from atmospheric gases, the clutter loss, the shadow fading, and the free-space path loss, all specified in 3GPP standards [12, Tables 6.6.2-1–6.6.2-3]. Additionally, pLsubscript????p_{L}italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT and pNsubscript????p_{N}italic_p start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT in (1) represent LoS and non-LoS probabilities, respectively111Details on the calculation of these terms can be found in [13].. The term ajtsubscriptsuperscript??????a^{t}_{j}italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT indicates the deterministic LoS component, and hjt?[k]subscriptsuperscript?????delimited-[]??h^{t}_{j}[k]italic_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ italic_k ] captures small-scale fading.

On the other side, the gain of the channels between PL j??jitalic_j and its follower v??vitalic_v, hj,vsubscript?????h_{j,v}italic_h start_POSTSUBSCRIPT italic_j , italic_v end_POSTSUBSCRIPT, or infrastructure, hj,Isubscript?????h_{j,I}italic_h start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT, in k??kitalic_kth sub-channel during the t??titalic_tth coherence time is given by [4]

hj,mt?[k]=αj,mt?gj,mt?[k],m{v,I},formulae-sequencesuperscriptsubscript???????delimited-[]??superscriptsubscript????????superscriptsubscript????????delimited-[]????????h_{j,m}^{t}[k]=\alpha_{j,m}^{t}g_{j,m}^{t}[k],\;\;m\in\{v,I\},italic_h start_POSTSUBSCRIPT italic_j , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] = italic_α start_POSTSUBSCRIPT italic_j , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_g start_POSTSUBSCRIPT italic_j , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] , italic_m ∈ { italic_v , italic_I } , (2)

where αj,mtsuperscriptsubscript????????\alpha_{j,m}^{t}italic_α start_POSTSUBSCRIPT italic_j , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT represents the large-scale fading effect based on path loss and shadowing, while gj,mt?[k]superscriptsubscript????????delimited-[]??g_{j,m}^{t}[k]italic_g start_POSTSUBSCRIPT italic_j , italic_m end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] indicates the small-scale fading.

TABLE I: List of Notations
Symbol Description
Ajtsuperscriptsubscript??????A_{j}^{t}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT AoI for agent j??jitalic_j at time t??titalic_t
pjt?[k]superscriptsubscript??????delimited-[]??p_{j}^{t}[k]italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] Transmission power of agent j??jitalic_j at time t??titalic_t on channel k??kitalic_k
Cj,It?[k]superscriptsubscript????????delimited-[]??C_{j,I}^{t}[k]italic_C start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] Capacity of V2I communication
Cj,vt?[k]superscriptsubscript????????delimited-[]??C_{j,v}^{t}[k]italic_C start_POSTSUBSCRIPT italic_j , italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] Capacity of V2V communication
Cj,Ht?[k]superscriptsubscript????????delimited-[]??C_{j,H}^{t}[k]italic_C start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] Capacity of V2H communication
θjtsuperscriptsubscript??????\theta_{j}^{t}italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Communication mode selection (V2I, V2V, V2H)
βjt?[k]superscriptsubscript??????delimited-[]??\beta_{j}^{t}[k]italic_β start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] Subchannel selection status
σ2superscript??2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT Noise power
Δ?tΔ??\Delta troman_Δ italic_t Time slot duration
ζjsubscript????\zeta_{j}italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT Minimum data transmission requirement
F?(?)???F(\cdot)italic_F ( ? ) Function limiting energy consumption
G?(?)???G(\cdot)italic_G ( ? ) Step function
Ijt?[k]superscriptsubscript??????delimited-[]??I_{j}^{t}[k]italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] Interference on channel k??kitalic_k for agent j??jitalic_j at time t??titalic_t
pLsubscript????p_{L}italic_p start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT LoS probability
pNsubscript????p_{N}italic_p start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT NLoS probability
???????\mathcal{PL}caligraphic_P caligraphic_L Overall path loss
hjt?[k]superscriptsubscript?????delimited-[]??h_{j}^{t}[k]italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] Channel gain
Pmaxsubscript??maxP_{\text{max}}italic_P start_POSTSUBSCRIPT max end_POSTSUBSCRIPT Maximum transmission power allowed (in dBm)
T??Titalic_T Number of time slots
κ1,κ2,subscript??1subscript??2\kappa_{1},\kappa_{2},...italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … Weights used in the reward function
N??Nitalic_N Total number of vehicles/agents in the simulation
sjtsuperscriptsubscript??????s_{j}^{t}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT State observed by agent j??jitalic_j at time t??titalic_t
ajtsuperscriptsubscript??????a_{j}^{t}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Action taken by agent j??jitalic_j at time t??titalic_t
rjtsuperscriptsubscript??????r_{j}^{t}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Local reward received by agent j??jitalic_j
rtsuperscript????r^{t}italic_r start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT Global reward shared among agents
πjsubscript????\pi_{j}italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT Policy of agent j??jitalic_j
γ??\gammaitalic_γ Discount factor in reinforcement learning
Djsubscript????D_{j}italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT Replay buffer for experience storage

II-B Problem Formulation

In the proposed system, each PL maintains an AoI metric Ajtsuperscriptsubscript??????A_{j}^{t}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT that quantifies data freshness. The AoI is updated for j??jitalic_jth PL at time (t+1)??1(t+1)( italic_t + 1 ) based on the selected communication mode denoted by θjtsuperscriptsubscript??????\theta_{j}^{t}italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT as follows:

Ajt+1={Δ?t,if??θjt=0??and??Cj,It?[k]Cj,Imin,Δ?t,if??θjt=2??and??Cj,Ht?[k]Cj,Hmin,Ajt+Δ?t,otherwise.superscriptsubscript??????1casesΔ??if?superscriptsubscript??????0?and?superscriptsubscript????????delimited-[]??superscriptsubscript??????Δ??if?superscriptsubscript??????2?and?superscriptsubscript????????delimited-[]??superscriptsubscript??????superscriptsubscript??????Δ??otherwiseA_{j}^{t+1}=\begin{cases}\Delta t,&\text{if }\theta_{j}^{t}=0\text{ and }C_{j,% I}^{t}[k]\geq C_{j,I}^{\min},\\ \Delta t,&\text{if }\theta_{j}^{t}=2\text{ and }C_{j,H}^{t}[k]\geq C_{j,H}^{% \min},\\ A_{j}^{t}+\Delta t,&\text{otherwise}.\end{cases}italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT = { start_ROW start_CELL roman_Δ italic_t , end_CELL start_CELL if italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 0 and italic_C start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] ≥ italic_C start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL roman_Δ italic_t , end_CELL start_CELL if italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 2 and italic_C start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] ≥ italic_C start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT , end_CELL end_ROW start_ROW start_CELL italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + roman_Δ italic_t , end_CELL start_CELL otherwise . end_CELL end_ROW (3)

Here, Cj,Iminsuperscriptsubscript??????C_{j,I}^{\min}italic_C start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT and Cj,Hminsuperscriptsubscript??????C_{j,H}^{\min}italic_C start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_min end_POSTSUPERSCRIPT denote the minimum required capacities for successful V2I and V2H transmissions, respectively. If no valid update is received, the AoI increases, reflecting data staleness.

Let βj,kt{0,1}superscriptsubscript????????01\beta_{j,k}^{t}\in\{0,1\}italic_β start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ { 0 , 1 } indicate whether sub-channel k??kitalic_k is assigned to j??jitalic_jth PL at time t??titalic_t, and θjt{0,1,2}superscriptsubscript??????012\theta_{j}^{t}\in\{0,1,2\}italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ { 0 , 1 , 2 }. Here, θjt=0superscriptsubscript??????0\theta_{j}^{t}=0italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 0 corresponds to V2I, θjt=1superscriptsubscript??????1\theta_{j}^{t}=1italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 1 corresponds to V2V, and θjt=2superscriptsubscript??????2\theta_{j}^{t}=2italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 2 corresponds to V2H. The transmission power allocated by PL?j??jitalic_j on sub-channel k??kitalic_k is denoted as pjt?[k]superscriptsubscript??????delimited-[]??p_{j}^{t}[k]italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ]. Then, the achievable capacity of PL?j??jitalic_j on sub-channel k??kitalic_k for V2H mode is written by

Cj,Ht?[k]=log2?(1+δ?(θjt?2)?βj,kt?pjt?[k]?hj,Ht?[k]IjH?[k]+σ2),superscriptsubscript????????delimited-[]??subscript21??superscriptsubscript??????2superscriptsubscript????????superscriptsubscript??????delimited-[]??superscriptsubscript???????delimited-[]??superscriptsubscript??????delimited-[]??superscript??2C_{j,H}^{t}[k]=\log_{2}\left(1+\frac{\delta(\theta_{j}^{t}-2)\beta_{j,k}^{t}p_% {j}^{t}[k]h_{j,H}^{t}[k]}{I_{j}^{H}[k]+\sigma^{2}}\right),italic_C start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] = roman_log start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( 1 + divide start_ARG italic_δ ( italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - 2 ) italic_β start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] italic_h start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] end_ARG start_ARG italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT [ italic_k ] + italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) , (4)

where δ?(?)???\delta(\cdot)italic_δ ( ? ) is the indicator function ensuring that the capacity formula is applied only when PL?j??jitalic_j operates in V2H mode (θjt=2superscriptsubscript??????2\theta_{j}^{t}=2italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = 2). The term σ2superscript??2\sigma^{2}italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT refers to the thermal noise power, and IjH?[k]superscriptsubscript??????delimited-[]??I_{j}^{H}[k]italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT [ italic_k ] denotes the total interference power from other PL using the same sub-channel and can be given as follows

IjH?[k]=jjβj,kt?pjt?[k]?hj,Ht?[k].superscriptsubscript??????delimited-[]??subscriptsuperscript????superscriptsubscript??superscript??????superscriptsubscript??superscript????delimited-[]??superscriptsubscript?superscript??????delimited-[]??I_{j}^{H}[k]=\sum_{j^{\prime}\neq j}\beta_{j^{\prime},k}^{t}p_{j^{\prime}}^{t}% [k]h_{j^{\prime},H}^{t}[k].italic_I start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT [ italic_k ] = ∑ start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≠ italic_j end_POSTSUBSCRIPT italic_β start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] italic_h start_POSTSUBSCRIPT italic_j start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] . (5)

A similar capacity formulation applies for V2I and V2V links, replacing the corresponding channel gain hj,Ht?[k]superscriptsubscript???????delimited-[]??h_{j,H}^{t}[k]italic_h start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] with hj,It?[k]superscriptsubscript???????delimited-[]??h_{j,I}^{t}[k]italic_h start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] for V2I and hj,vt?[k]superscriptsubscript???????delimited-[]??h_{j,v}^{t}[k]italic_h start_POSTSUBSCRIPT italic_j , italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] for V2V links, as defined in (2), while adjusting the interference term and δ?(?)???\delta(\cdot)italic_δ ( ? ) function accordingly.

We aim to minimize the average AoI and power consumption for every platoon while respecting capacity constraints and ensuring reliable data delivery among the platoon members. Accordingly, the multi-objective optimization problem formulated for platoon j??jitalic_j can be defined as:

minβ,θ,p{\displaystyle\min_{\beta,\theta,p}\Bigg{\{}roman_min start_POSTSUBSCRIPT italic_β , italic_θ , italic_p end_POSTSUBSCRIPT { 1T?t=1TAjt?Pr?{t=1Tk??min???{Cj,vt?[k]}?Δ?tζj}1??superscriptsubscript??1??superscriptsubscript??????Prsuperscriptsubscript??1??subscript??????superscriptsubscript????????delimited-[]??Δ??subscript????\displaystyle\frac{1}{T}\sum_{t=1}^{T}A_{j}^{t}-\Pr\left\{\sum_{t=1}^{T}\sum_{% k\in\mathcal{K}}\underset{v}{\min}\left\{C_{j,v}^{t}[k]\right\}\Delta t\geq% \zeta_{j}\right\}divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - roman_Pr { ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT underitalic_v start_ARG roman_min end_ARG { italic_C start_POSTSUBSCRIPT italic_j , italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] } roman_Δ italic_t ≥ italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }
+1Tt=1Tk??pjt[k]},\displaystyle+\frac{1}{T}\sum_{t=1}^{T}\sum_{k\in\mathcal{K}}p_{j}^{t}[k]\Bigg% {\}},+ divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] } ,
s.t.?C?1::s.t.??1absent\displaystyle\textbf{s.t.}\;\;C1:s.t. italic_C 1 : Cj,Ht?[k]Cj,Hmin,?j??,?k??,formulae-sequencesubscriptsuperscript????????delimited-[]??superscriptsubscript??????minformulae-sequencefor-all????for-all????\displaystyle\,C^{t}_{j,H}[k]\geq C_{j,H}^{\text{min}},\quad\forall j\in% \mathcal{P},\,\forall k\in\mathcal{K},\,italic_C start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT [ italic_k ] ≥ italic_C start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT min end_POSTSUPERSCRIPT , ? italic_j ∈ caligraphic_P , ? italic_k ∈ caligraphic_K ,
C?2::??2absent\displaystyle C2:italic_C 2 : Cj,It?[k]Cj,Imin,?j??,?k??,formulae-sequencesubscriptsuperscript????????delimited-[]??superscriptsubscript??????minformulae-sequencefor-all????for-all????\displaystyle\,C^{t}_{j,I}[k]\geq C_{j,I}^{\text{min}},\quad\forall j\in% \mathcal{P},\,\forall k\in\mathcal{K},\,italic_C start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT [ italic_k ] ≥ italic_C start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT min end_POSTSUPERSCRIPT , ? italic_j ∈ caligraphic_P , ? italic_k ∈ caligraphic_K ,
C?3::??3absent\displaystyle C3:italic_C 3 : βj,it{0,1},θj{0,1,2},?j??,i{I,V,H},formulae-sequencesubscriptsuperscript????????01formulae-sequencesubscript????012formulae-sequencefor-all????????????\displaystyle\,\beta^{t}_{j,i}\in\{0,1\},\;\theta_{j}\in\{0,1,2\},\;\forall j% \in\mathcal{P},\,i\in\{I,V,H\},italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT ∈ { 0 , 1 } , italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ { 0 , 1 , 2 } , ? italic_j ∈ caligraphic_P , italic_i ∈ { italic_I , italic_V , italic_H } ,
C?4::??4absent\displaystyle C4:italic_C 4 : k??βj,it1,?j??,?t?,formulae-sequencesubscript????subscriptsuperscript????????1formulae-sequencefor-all????for-all???\displaystyle\sum_{k\in\mathcal{K}}\beta^{t}_{j,i}\leq 1,\quad\forall j\in% \mathcal{P},\,\forall t\in\mathcal{\mathbb{N}},∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT italic_β start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT ≤ 1 , ? italic_j ∈ caligraphic_P , ? italic_t ∈ blackboard_N ,
C?5::??5absent\displaystyle C5:italic_C 5 : pjt?[k]pjmax,?j??,?k??,formulae-sequencesubscriptsuperscript??????delimited-[]??subscriptsuperscript??max??formulae-sequencefor-all????for-all????\displaystyle\,p^{t}_{j}[k]\leq p^{\text{max}}_{j},\quad\forall j\in\mathcal{P% },\,\forall k\in\mathcal{K},italic_p start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ italic_k ] ≤ italic_p start_POSTSUPERSCRIPT max end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , ? italic_j ∈ caligraphic_P , ? italic_k ∈ caligraphic_K , (6)

where ζjsubscript????\zeta_{j}italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is the size of the cooperative awareness message (CAM), while ??={1,2,,P}??12??\mathcal{P}=\{1,2,...,P\}caligraphic_P = { 1 , 2 , … , italic_P }. The objective function given in (6) consists of minimizing the average AoI, represented by 1T?t=1TAjt1??superscriptsubscript??1??superscriptsubscript??????\frac{1}{T}\sum_{t=1}^{T}A_{j}^{t}divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, and ensuring the probability that the total communication capacity overall time slots and channels meets or exceeds the minimum data requirement, i.e., Pr?{t=1Tk??min???{Cj,vt?[k]}?Δ?tζj}Prsuperscriptsubscript??1??subscript??????superscriptsubscript????????delimited-[]??Δ??subscript????\Pr\left\{\sum_{t=1}^{T}\sum_{k\in\mathcal{K}}\underset{v}{\min}\left\{C_{j,v}% ^{t}[k]\right\}\Delta t\geq\zeta_{j}\right\}roman_Pr { ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT underitalic_v start_ARG roman_min end_ARG { italic_C start_POSTSUBSCRIPT italic_j , italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ] } roman_Δ italic_t ≥ italic_ζ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT }. Furthermore, optimization aims to minimize total power consumption in all channels and time slots, given by 1T?t=1Tk??pjt?[k]1??superscriptsubscript??1??subscript????superscriptsubscript??????delimited-[]??\frac{1}{T}\sum_{t=1}^{T}\sum_{k\in\mathcal{K}}p_{j}^{t}[k]divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_K end_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT [ italic_k ].

In (6), the first constraint ensures that the communication capacity Cj,it?[k]subscriptsuperscript????????delimited-[]??C^{t}_{j,i}[k]italic_C start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT [ italic_k ] satisfies a minimum value Cj,iminsuperscriptsubscript??????minC_{j,i}^{\text{min}}italic_C start_POSTSUBSCRIPT italic_j , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT min end_POSTSUPERSCRIPT. The second and third constraints force each agent to choose a valid communication mode and use only one sub-channel at any given time. With the last constraint, the transmission power for each agent is limited to a maximum value pjmaxsubscriptsuperscript??max??p^{\text{max}}_{j}italic_p start_POSTSUPERSCRIPT max end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

III DRL Approaches for AoI Optimization

This section explores two reinforcement learning approaches to optimize AoI in HAPS-assisted V2X networks. The first, DDPG, follows a single-agent paradigm, where each PL independently optimizes its AoI based on local observations. While this allows decentralized decision-making, it does not inherently account for inter-agent dependencies, leading to suboptimal resource allocation in congested networks. Due to the lack of inter-agent coordination, DDPG struggles to adapt to dynamic interference patterns, resulting in performance degradation and slower convergence.

The second approach, FD-MADDPG, extends DDPG to a multi-agent reinforcement learning framework, allowing multiple PL to learn concurrently without explicit coordination. Unlike traditional MADDPG, which employs a centralized critic, FD-MADDPG eliminates the need for centralized training. This enables real-time decisions based solely on local observations, improving scalability and robustness in large-scale V2X networks. Using independent learning strategies, FD-MADDPG is known to achieve faster convergence and lower AoI, particularly in high-mobility and dense-vehicle scenarios. In addition, it enables dynamically adjusting transmission and resource allocation strategies based on environmental variations. This makes FD-MADDPG more efficient in handling network congestion and spectral efficiency.

In both approaches, each PL acts as an agent that interacts with the vehicle environment by observing situations and taking the necessary actions according to its predefined policy. Therefore, at any time t??titalic_t, each PL j??jitalic_j observes the state space, which is given by

sjt=[hj,vt?[k],hj,It?[k],hj,Ht?[k],Ijt?1?[k],Ajt,ζjr,Tjr].subscriptsuperscript??????subscriptsuperscript???????delimited-[]??subscriptsuperscript???????delimited-[]??subscriptsuperscript???????delimited-[]??subscriptsuperscript????1??delimited-[]??subscriptsuperscript??????subscriptsuperscript??????subscriptsuperscript??????\displaystyle s^{t}_{j}=[h^{t}_{j,v}[k],h^{t}_{j,I}[k],h^{t}_{j,H}[k],I^{t-1}_% {j}[k],A^{t}_{j},\zeta^{r}_{j},T^{r}_{j}].italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = [ italic_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_v end_POSTSUBSCRIPT [ italic_k ] , italic_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT [ italic_k ] , italic_h start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT [ italic_k ] , italic_I start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ italic_k ] , italic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_ζ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_T start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ] . (7)

According to (7), PLs observe not only the instant channel states but also the AoI and the amount of interference caused by other platoons in the previous step. In addition, ζjrsubscriptsuperscript??????\zeta^{r}_{j}italic_ζ start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and Tjrsubscriptsuperscript??????T^{r}_{j}italic_T start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in (7) represent the remaining intra-platoon message load and the remaining time budget, respectively. It should be noted here that each PL maintains an independent experience replay buffer Dj={sjt,ajt,rjt,sjt+1}subscript????superscriptsubscript??????superscriptsubscript??????superscriptsubscript??????superscriptsubscript??????1D_{j}=\{s_{j}^{t},a_{j}^{t},r_{j}^{t},s_{j}^{t+1}\}italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = { italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT }, ensuring that training remains decentralized.

Algorithm 1 Training of DDPG and FD-MADDPG
1:??Initialize actor network πθjsubscript??subscript????\pi_{\theta_{j}}italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT and critic network Q?jsubscript??subscriptitalic-???Q_{\phi_{j}}italic_Q start_POSTSUBSCRIPT italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT with random weights.
2:??Initialize target networks πθjsubscript??superscriptsubscript????\pi_{\theta_{j}^{\prime}}italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT and Q?jsubscript??superscriptsubscriptitalic-???Q_{\phi_{j}^{\prime}}italic_Q start_POSTSUBSCRIPT italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT.
3:??Initialize experience replay buffer Djsubscript????D_{j}italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.
4:??for?each episode?do
5:?????Reset environment and receive initial state sj0superscriptsubscript????0s_{j}^{0}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT.
6:?????for?each time step t??titalic_t?do
7:????????Select action ajt=πθj?(sjt)+??superscriptsubscript??????subscript??subscript????superscriptsubscript????????a_{j}^{t}=\pi_{\theta_{j}}(s_{j}^{t})+\mathcal{N}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) + caligraphic_N (with exploration noise ????\mathcal{N}caligraphic_N).
8:????????Execute action ajtsuperscriptsubscript??????a_{j}^{t}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, observe reward rjtsuperscriptsubscript??????r_{j}^{t}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT and next state sjt+1superscriptsubscript??????1s_{j}^{t+1}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT.
9:????????Store transition (sjt,ajt,rjt,sjt+1)superscriptsubscript??????superscriptsubscript??????superscriptsubscript??????superscriptsubscript??????1(s_{j}^{t},a_{j}^{t},r_{j}^{t},s_{j}^{t+1})( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) in Djsubscript????D_{j}italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.
10:????????Sample minibatch from Djsubscript????D_{j}italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.
11:????????Compute target value:
yjt=rjt+γ?Q?j?(sjt+1,πθj?(sjt+1)).superscriptsubscript??????superscriptsubscript????????subscript??superscriptsubscriptitalic-???superscriptsubscript??????1subscript??superscriptsubscript????superscriptsubscript??????1y_{j}^{t}=r_{j}^{t}+\gamma Q_{\phi_{j}^{\prime}}(s_{j}^{t+1},\pi_{\theta_{j}^{% \prime}}(s_{j}^{t+1})).italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_γ italic_Q start_POSTSUBSCRIPT italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ) .
12:????????Update critic network by minimizing loss.
L?(?j)=1N?(Q?j?(sjt,ajt)?yjt)2.??subscriptitalic-???1??superscriptsubscript??subscriptitalic-???superscriptsubscript??????superscriptsubscript??????superscriptsubscript??????2L(\phi_{j})=\frac{1}{N}\sum\left(Q_{\phi_{j}}(s_{j}^{t},a_{j}^{t})-y_{j}^{t}% \right)^{2}.italic_L ( italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ ( italic_Q start_POSTSUBSCRIPT italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT .
13:????????Update actor network:
θj?Jj=???[θj?πj?(aj|sj)?aj?Q?j?(sj,aj)].subscriptsubscript????subscript??????delimited-[]subscriptsubscript????subscript????conditionalsubscript????subscript????subscriptsubscript????subscript??subscriptitalic-???subscript????subscript????\triangledown_{\theta_{j}}J_{j}=\mathbb{E}\left[\triangledown_{\theta_{j}}\pi_% {j}(a_{j}|s_{j})\triangledown_{a_{j}}Q_{\phi_{j}}(s_{j},a_{j})\right].▽ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = blackboard_E [ ▽ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ▽ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] .
14:????????Soft update target networks.
15:?????end?for
16:??end?for

Next, the local reward function for both DDPG and FD-MADDPG is defined as follows

rjt=superscriptsubscript??????absent\displaystyle r_{j}^{t}=italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ?κ1?F?(pjt)?κ2?Ajt?κ3?G?(Cj,It?Cj,Imin)subscript??1??superscriptsubscript??????subscript??2superscriptsubscript??????subscript??3??superscriptsubscript????????superscriptsubscript??????min\displaystyle-\kappa_{1}F\left(p_{j}^{t}\right)-\kappa_{2}A_{j}^{t}-\kappa_{3}% G\left(C_{j,I}^{t}-C_{j,I}^{\text{min}}\right)- italic_κ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_F ( italic_p start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_κ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_κ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT italic_G ( italic_C start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_C start_POSTSUBSCRIPT italic_j , italic_I end_POSTSUBSCRIPT start_POSTSUPERSCRIPT min end_POSTSUPERSCRIPT )
?κ4?G?(Cj,Ht?Cj,Hmin),subscript??4??superscriptsubscript????????superscriptsubscript??????min\displaystyle-\kappa_{4}G\left(C_{j,H}^{t}-C_{j,H}^{\text{min}}\right),- italic_κ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT italic_G ( italic_C start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT - italic_C start_POSTSUBSCRIPT italic_j , italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT min end_POSTSUPERSCRIPT ) , (8)

where F?(?)???F(\cdot)italic_F ( ? ) penalizes high transmission power while restricting it to the same range as the other components, and the stepwise function G?(?)???G(\cdot)italic_G ( ? ) ensures that minimum capacity constraints are met [4]. However, a key distinction between DDPG and FD-MADDPG lies in how they optimize their reward function. In particular, DDPG utilizes a centralized critic that considers the overall interference in the environment. In contrast, FD-MADDPG operates in a fully decentralized manner, where each agent optimizes its reward independently. Considering that, Algorithm 1 provides a structured approach to training both DDPG and FD-MADDPG. While DDPG benefits from a global interference-aware reward function as outlined in Algorithm-1, FD-MADDPG relies solely on local observations and independent reward updates.

Both DDPG and FD-MADDPG employ a single-critic approach per agent that can be described by:

  1. 1.

    Each agent maintains an independent critic network Q?jsubscript??subscriptitalic-???Q_{\phi_{j}}italic_Q start_POSTSUBSCRIPT italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT, updated using its own local rewards rjtsuperscriptsubscript??????r_{j}^{t}italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

  2. 2.

    Each agent maintains an actor network πθjsubscript??subscript????\pi_{\theta_{j}}italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT, which determines optimal actions based on local state observations.

Furthermore, to maximize the reward for each agent, the policy gradient is updated for both methods as

θj?Jjsubscriptsubscript????subscript????\displaystyle\triangledown_{\theta_{j}}J_{j}▽ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_J start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT =??sj,ajDj?[θj?πj?(aj|sj)?aj?Q?j?(sj,aj)],absentsubscript??similar-tosubscript????subscript????subscript????delimited-[]subscriptsubscript????subscript????conditionalsubscript????subscript????subscriptsubscript????subscript??subscriptitalic-???subscript????subscript????\displaystyle=\mathbb{E}_{s_{j},a_{j}\sim D_{j}}\left[\triangledown_{\theta_{j% }}\pi_{j}(a_{j}|s_{j})\triangledown_{a_{j}}Q_{\phi_{j}}(s_{j},a_{j})\right],= blackboard_E start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ~ italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ▽ start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ▽ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_Q start_POSTSUBSCRIPT italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ] , (9)

in which πj?(aj|sj)subscript????conditionalsubscript????subscript????\pi_{j}(a_{j}|s_{j})italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) denotes the policy function for j??jitalic_jth agent, which defines the probability of selecting action ajsubscript????a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT given the state sjsubscript????s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. The term Q?j?(sj,aj)subscript??subscriptitalic-???subscript????subscript????Q_{\phi_{j}}(s_{j},a_{j})italic_Q start_POSTSUBSCRIPT italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) refers to the action-value function approximated by the critic network parameterized by ?jsubscriptitalic-???\phi_{j}italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, which estimates the expected cumulative discounted reward obtained by executing action ajsubscript????a_{j}italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in state sjsubscript????s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and subsequently following the agent’s policy πjsubscript????\pi_{j}italic_π start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. This function serves as a baseline for the policy gradient and facilitates the evaluation of action quality within continuous action domains. Besides, ??s,aDsubscript??similar-to??????\mathbb{E}_{s,a\sim D}blackboard_E start_POSTSUBSCRIPT italic_s , italic_a ~ italic_D end_POSTSUBSCRIPT in (9) indicates the expected value over state-action pairs sampled from the experience replay buffer.

Finally, it should be noted that the critic network is updated by minimizing the following loss function:

L?(?j)=1N?i=1N(Q?j?(sjt,ajt)?yjt)2,??subscriptitalic-???1??superscriptsubscript??1??superscriptsubscript??subscriptitalic-???superscriptsubscript??????superscriptsubscript??????superscriptsubscript??????2\displaystyle L(\phi_{j})=\frac{1}{N}\sum_{i=1}^{N}\left(Q_{\phi_{j}}(s_{j}^{t% },a_{j}^{t})-y_{j}^{t}\right)^{2},italic_L ( italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_Q start_POSTSUBSCRIPT italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_a start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (10)

where yjt=rjt+γ?Q?j?(sjt+1,πθj?(sjt+1))superscriptsubscript??????superscriptsubscript????????subscript??superscriptsubscriptitalic-???superscriptsubscript??????1subscript??superscriptsubscript????superscriptsubscript??????1y_{j}^{t}=r_{j}^{t}+\gamma Q_{\phi_{j}^{\prime}}(s_{j}^{t+1},\pi_{\theta_{j}^{% \prime}}(s_{j}^{t+1}))italic_y start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = italic_r start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + italic_γ italic_Q start_POSTSUBSCRIPT italic_? start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT , italic_π start_POSTSUBSCRIPT italic_θ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + 1 end_POSTSUPERSCRIPT ) ) with reward discount factor γ??\gammaitalic_γ.

IV Numerical Results

This section presents simulation results for the proposed HAPS-V2X network consisting of a HAPS, an RSU, and five PLs with six followers each in an urban area. Specifically, two DRL-based approaches are utilized in this scenario to optimize resource allocation with minimum AoI and are compared in terms of average AoI value and reward function convergence. The values of the other parameters used for the simulations are determined as shown in Table II.

Fig.?2 presents a comparison of the reward function convergence for the utilized methods. According to this figure, FD-MADDPG clearly converges faster to a higher and more precise reward value compared to the DDPG approach. This is due to the fact that each PL in the FD-MADDPG learns independently using only its local observations, which accelerates convergence in multi-agent environments. Accordingly, considering also the long-term fluctuations, it can be concluded from this figure that the learning process of DDPG is longer and more challenging than that of FD-MADDPG for the considered HAPS-V2X scenario.

TABLE II: Initial Parameters for the Simulation
Parameter Description Value
The distance between platoon vehicles 25 m
Max. transmission power of PL 30 dBm
Min. required data rate for V2I 540 kbps
Available total bandwidth 180 kHz
Data size in inter-vehicle communication 4000 Bytes
Batch size used for training 64
Reward discount factor (γ??\gammaitalic_γ) 0.99
Standard deviation of noise (σ??\sigmaitalic_σ) 0.3 dB
Actor layer dimensions [1024, 512]
Critic layer dimensions [1024, 512, 256]
Learning rate of actor (α??\alphaitalic_α) 0.0001
Learning rate of critic (β??\betaitalic_β) 0.001

In Fig.?3, FD-MADDPG is shown to be providing much lower AoI values in V2X networks compared to DDPG under similar training conditions, which can be further reduced with HAPS support. In addition, it is observed that DDPG is heavily affected by the increase in inter-platoon spacing. For example, in the HAPS-V2X scenario, the average AoI for DDPG is approximately 13 ms while for FD-MADDPG it is only 6 ms when the gap between platoons is 5 m. It is noteworthy that when the spacing between platoons is increased to 35 m, the AoI increases by only about 3 ms for FD-MADDPG, whereas it increases by almost 20 ms for DDPG. On the one hand, this shows that the decentralized solution handles interference and channel variations more effectively, keeping information fresher across the network and improving overall network reliability in HAPS-supported V2X scenarios. On the other hand, it indicates that the information update rate of the DDPG algorithm remains relatively slow even with HAPS support.

Refer to caption
Figure 2: Comparison of the reward function convergence.
Refer to caption
Figure 3: Comparison of the average AoI.

V Conclusion

This paper investigates the contribution of HAPS integration and the effectiveness of two different DRL approaches in resource allocation prioritizing information freshness in V2X networks. Numerical evaluations have proved significant improvements in network reliability, showing that lower AoI, faster convergence, and thus, better spectrum utilization can be achieved with FD-MADDPG compared to the conventional DDPG model. These findings highlight that HAPS can play a critical role in providing uninterrupted connectivity in time-critical scenarios, especially in environments with limited infrastructure. It should be noted here that the widespread use of solar-powered HAPS will also add a sustainable and environmentally friendly dimension to 6G networks, supporting greener communications.

Future research can be focused on exploring energy-efficient learning strategies, adaptive reward mechanisms, and real-world deployments in large-scale vehicular networks. Additionally, investigating HAPS mobility and hybrid artificial intelligence-driven optimization can be considered to further enhance the adaptability of these networks. Comparing the performance of the proposed scheme with other recently used optimization methods, such as attention-based DRL, federated learning, or game-theoretic approaches, can be another stimulating direction.

References

  • [1] W.?Jaafar and H.?Yanikomeroglu, “HAPS-ITS: Enabling future ITS services in trans-continental highways,” IEEE Communications Magazine, vol.?60, no.?10, pp.?80–86, 2022.
  • [2] M.?Noor-A-Rahim, Z.?Liu, H.?Lee, M.?O. Khyam, J.?He, D.?Pesch, K.?Moessner, W.?Saad, and H.?V. Poor, “6G for vehicle-to-everything (V2X) communications: Enabling technologies, challenges, and opportunities,” Proceedings of the IEEE, vol.?110, no.?6, pp.?712–734, 2022.
  • [3] J.?Clancy, D.?Mullins, B.?Deegan, J.?Horgan, E.?Ward, C.?Eising, P.?Denny, E.?Jones, and M.?Glavin, “Wireless access for V2X communications: Research, challenges and opportunities,” IEEE Communications Surveys & Tutorials, vol.?26, no.?3, pp.?2082–2119, 2024.
  • [4] M.?Parvini, M.?R. Javan, N.?Mokari, B.?A. Arand, and E.?A. Jorswieck, “AoI aware radio resource management of autonomous platoons via multi agent reinforcement learning,” in 2021 17th International Symposium on Wireless Communication Systems (ISWCS), pp.?1–6, 2021.
  • [5] Annu and P.?Rajalakshmi, “Towards 6G V2X sidelink: Survey of resource allocation—mathematical formulations, challenges, and proposed solutions,” IEEE Open Journal of Vehicular Technology, vol.?5, pp.?344–383, 2024.
  • [6] A.?M. Ince, A.?E. Canbilen, and H.?Yanikomeroglu, “HAPS-enabled V2X architecture for hyper reliable and low-latency communication (HRLLC) in 6G networks,” in International Conference on Communications, Signal Processing, and their Applications (ICCSPA), pp.?1–6, 2024.
  • [7] O.?Abbasi and H.?Yanikomeroglu, “UxNB-enabled cell-free massive MIMO with HAPS-assisted sub-THz backhauling,” IEEE Transactions on Vehicular Technology, vol.?73, no.?5, pp.?6937–6953, 2024.
  • [8] G.?K. Kurt, M.?G. Khoshkholgh, S.?Alfattani, A.?Ibrahim, T.?S. Darwish, M.?S. Alam, H.?Yanikomeroglu, and A.?Yongacoglu, “A vision and framework for the high altitude platform station (HAPS) networks of the future,” IEEE Communications Surveys & Tutorials, vol.?23, no.?2, pp.?729–779, 2021.
  • [9] Q.?Ren, O.?Abbasi, G.?K. Kurt, H.?Yanikomeroglu, and J.?Chen, “Caching and computation offloading in high altitude platform station (HAPS) assisted intelligent transportation systems,” IEEE Transactions on Wireless Communications, vol.?21, no.?11, pp.?9010–9024, 2022.
  • [10] H.?Ye, G.?Y. Li, and B.-H.?F. Juang, “Deep reinforcement learning based resource allocation for V2V communications,” IEEE Transactions on Vehicular Technology, vol.?68, no.?4, pp.?3163–3173, 2019.
  • [11] Q.?Ren, O.?Abbasi, G.?K. Kurt, H.?Yanikomeroglu, and J.?Chen, “Handoff-aware distributed computing in high altitude platform station (HAPS)–assisted vehicular networks,” IEEE Transactions on Wireless Communications, vol.?22, no.?12, pp.?8814–8827, 2023.
  • [12] 3GPP, “Technical specification group radio access network; study on new radio (NR) to support non-terrestrial networks (release 15),” 3GPP TR 38.811 V15.1.0, Jun. 2019.
  • [13] S.?Alfattani, W.?Jaafar, Y.?Hmamouche, H.?Yanikomeroglu, and A.?Yonga?oglu, “Link budget analysis for reconfigurable smart surfaces in aerial platforms,” IEEE Open Journal of the Communications Society, vol.?2, pp.?1980–1995, 2021.
心魔是什么意思 什么是结膜炎 哥谭市是什么意思 小孩记忆力差什么原因 前列腺炎吃什么中药
moco是什么牌子 海绵体修复吃什么药 骨密度t值是什么意思 干咳嗽无痰是什么原因 唇炎属于什么科
巨蟹座男和什么座最配对 主动脉夹层是什么意思 属虎的脖子戴什么招财 月青念什么 ac代表什么意思
发呆是什么意思 wbc是什么意思医学 二尖瓣钙化是什么意思 女人吃枸杞有什么好处 发泡胶用什么能洗掉
卵巢囊肿吃什么药好得最快hcv9jop3ns3r.cn 牙周炎吃什么药最好hcv8jop1ns8r.cn 骨刺是什么原因引起的hcv8jop5ns3r.cn 着凉了吃什么药hcv7jop6ns3r.cn 小肚子一直疼是什么原因jasonfriends.com
身份证借给别人有什么危害性zhiyanzhang.com 镜检白细胞高是什么原因dayuxmw.com 蒙羞是什么意思hcv7jop5ns3r.cn buffalo是什么牌子hcv9jop2ns8r.cn 什么的蹲着wzqsfys.com
蛇鼠一窝是什么意思hcv9jop3ns6r.cn qs是什么hcv9jop3ns9r.cn 什么是姑息治疗hcv8jop9ns3r.cn 蒲公英长什么样sanhestory.com 牡丹和芍药有什么区别hcv8jop1ns9r.cn
为什么不建议治疗幽门螺杆菌hcv8jop9ns4r.cn 今年清明节有什么讲究hcv7jop4ns7r.cn 射精快吃什么药hcv9jop0ns3r.cn 峦是什么意思hcv8jop2ns6r.cn 夸父是一个什么样的人hcv7jop7ns3r.cn
百度