鹿晗泡澡福利完整视频 鹿晗泡澡福利高清大图
百度 为了动车安全,铁路部门出台升级版的动车“禁烟令”,可谓顺应民意。See recent articles
Showing new listings for Wednesday, 6 August 2025
- [1] arXiv:2508.03471 [pdf, html, other]
-
Title: Learned Adaptive IndexingSubjects: Databases (cs.DB)
Indexes can significantly improve search performance in relational databases. However, if the query workload changes frequently or new data updates occur continuously, it may not be worthwhile to build a conventional index upfront for query processing. Adaptive indexing is a technique in which an index gets built on the fly as a byproduct of query processing. In recent years, research in database indexing has taken a new direction where machine learning models are employed for the purpose of indexing. These indexes, known as learned indexes, can be more efficient compared to traditional indexes such as B+-tree in terms of memory footprints and query performance. However, a learned index has to be constructed upfront and requires training the model in advance, which becomes a challenge in dynamic situations when workload changes frequently. To the best of our knowledge, no learned indexes exist yet for adaptive indexing. We propose a novel learned approach for adaptive indexing. It is built on the fly as queries are submitted and utilizes learned models for indexing data. To enhance query performance, we employ a query workload prediction technique that makes future workload projection based on past workload data. We have evaluated our learned adaptive indexing approach against existing adaptive indexes for various query workloads. Our results show that our approach performs better than others in most cases, offering 1.2x - 5.6x improvement in query performance.
- [2] arXiv:2508.03565 [pdf, html, other]
-
Title: [Technical Report] ArceKV: Towards Workload-driven LSM-compactions for Key-Value Store Under Dynamic WorkloadsComments: 17 pages, 11 figuresSubjects: Databases (cs.DB)
Key-value stores underpin a wide range of applications due to their simplicity and efficiency. Log-Structured Merge Trees (LSM-trees) dominate as their underlying structure, excelling at handling rapidly growing data. Recent research has focused on optimizing LSM-tree performance under static workloads with fixed read-write ratios. However, real-world workloads are highly dynamic, and existing workload-aware approaches often struggle to sustain optimal performance or incur substantial transition overhead when workload patterns shift. To address this, we propose ElasticLSM, which removes traditional LSM-tree structural constraints to allow more flexible management actions (i.e., compactions and write stalls) creating greater opportunities for continuous performance optimization. We further design Arce, a lightweight compaction decision engine that guides ElasticLSM in selecting the optimal action from its expanded action space. Building on these components, we implement ArceKV, a full-fledged key-value store atop RocksDB. Extensive evaluations demonstrate that ArceKV outperforms state-of-the-art compaction strategies across diverse workloads, delivering around 3x faster performance in dynamic scenarios.
New submissions (showing 2 of 2 entries)
- [3] arXiv:2508.02758 (cross-list from q-fin.ST) [pdf, html, other]
-
Title: CTBench: Cryptocurrency Time Series Generation BenchmarkYihao Ang, Qiang Wang, Qiang Huang, Yifan Bao, Xinyu Xi, Anthony K. H. Tung, Chen Jin, Zhiyong HuangComments: 14 pages, 14 figures, and 3 tablesSubjects: Statistical Finance (q-fin.ST); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Databases (cs.DB); Machine Learning (cs.LG)
Synthetic time series are essential tools for data augmentation, stress testing, and algorithmic prototyping in quantitative finance. However, in cryptocurrency markets, characterized by 24/7 trading, extreme volatility, and rapid regime shifts, existing Time Series Generation (TSG) methods and benchmarks often fall short, jeopardizing practical utility. Most prior work (1) targets non-financial or traditional financial domains, (2) focuses narrowly on classification and forecasting while neglecting crypto-specific complexities, and (3) lacks critical financial evaluations, particularly for trading applications. To address these gaps, we introduce \textsf{CTBench}, the first comprehensive TSG benchmark tailored for the cryptocurrency domain. \textsf{CTBench} curates an open-source dataset from 452 tokens and evaluates TSG models across 13 metrics spanning 5 key dimensions: forecasting accuracy, rank fidelity, trading performance, risk assessment, and computational efficiency. A key innovation is a dual-task evaluation framework: (1) the \emph{Predictive Utility} task measures how well synthetic data preserves temporal and cross-sectional patterns for forecasting, while (2) the \emph{Statistical Arbitrage} task assesses whether reconstructed series support mean-reverting signals for trading. We benchmark eight representative models from five methodological families over four distinct market regimes, uncovering trade-offs between statistical fidelity and real-world profitability. Notably, \textsf{CTBench} offers model ranking analysis and actionable guidance for selecting and deploying TSG models in crypto analytics and strategy development.
- [4] arXiv:2508.02866 (cross-list from cs.DC) [pdf, html, other]
-
Title: PROV-AGENT: Unified Provenance for Tracking AI Agent Interactions in Agentic WorkflowsRenan Souza, Amal Gueroudji, Stephen DeWitt, Daniel Rosendo, Tirthankar Ghosal, Robert Ross, Prasanna Balaprakash, Rafael Ferreira da SilvaComments: Paper under peer-reviewed evaluationSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB)
Foundation models, such as Large Language Models (LLMs), are increasingly used as core components of AI agents in complex, large-scale workflows across federated and heterogeneous environments. In agentic workflows, autonomous agents plan tasks, interact with humans and peers, and shape scientific outcomes. This makes transparency, traceability, reproducibility, and reliability essential. However, AI-based agents can hallucinate or reason incorrectly, and their decisions may propagate errors through the workflow, especially when one agent's output feeds into another's input. Therefore, fine-grained provenance is essential to link agent decisions, their end-to-end context, and downstream impacts. While provenance techniques have long supported reproducibility and workflow data understanding, they fail to capture and relate agent-centric metadata (prompts, responses, and decisions) with the rest of the workflow. In this paper, we introduce PROV-AGENT, a provenance model that extends W3C PROV and leverages the Model Context Protocol (MCP) to integrate agent interactions into end-to-end workflow provenance. Our contributions include: (1) a provenance model tailored for agentic workflows, (2) a near real-time, open-source system for capturing agentic provenance, and (3) a cross-facility evaluation spanning edge, cloud, and HPC environments, demonstrating support for critical provenance queries and agent reliability analysis.
Cross submissions (showing 2 of 2 entries)
- [5] arXiv:2210.04179 (replaced) [pdf, html, other]
-
Title: Decentralized Graph-based Concurrency Control for Long-running Update Transactions (Extended Version)Comments: 14 pages, 14 figuresJournal-ref: PVLDB, 18(8): 2321 - 2333, 2025Subjects: Databases (cs.DB)
This paper proposes Oze, a concurrency control protocol that handles heterogeneous workloads, including long-running update transactions. Oze explores a large scheduling space using a multi-version serialization graph to reduce false positives. Oze manages the graph in a decentralized manner to exploit many cores in modern servers. We further propose an OLTP benchmark, BoMB (Bill of Materials Benchmark), based on a use case in an actual manufacturing company. BoMB consists of one long-running update transaction and five short transactions that conflict with each other. Experiments using BoMB show that Oze can handle the long-running update transaction while achieving four orders of magnitude higher throughput than state-of-the-art optimistic and multi-version protocols and up to five times higher throughput than pessimistic protocols. We also show Oze performs comparably with existing techniques even in a typical OLTP workload, TPC-C, thanks to a protocol switching mechanism.
- [6] arXiv:2508.02458 (replaced) [pdf, html, other]
-
Title: From Stimuli to Minds: Enhancing Psychological Reasoning in LLMs via Bilateral Reinforcement LearningSubjects: Databases (cs.DB)
Large Language Models show promise in emotion understanding, social reasoning, and empathy, yet they struggle with psychologically grounded tasks that require inferring implicit mental states in context-rich, ambiguous settings. These limitations arise from the absence of theory-aligned supervision and the difficulty of capturing nuanced mental processes in real-world narratives. To address this gap, we leverage expert-labeled, psychologically rich scenarios and propose a trajectory-aware reinforcement learning framework that explicitly imitates expert psychological thought patterns. By integrating real-world stimuli with structured reasoning guidance, our approach enables compact models to internalize social-cognitive principles, perform nuanced psychological inference, and support continual self-improvement. Comprehensive experiments across multiple benchmarks further demonstrate that our models achieve expert-level interpretive capabilities, exhibiting strong out-of-distribution generalization and robust continual learning across diverse, challenging, and psychologically grounded tasks.
- [7] arXiv:2508.02508 (replaced) [pdf, html, other]
-
Title: M2: An Analytic System with Specialized Storage Engines for Multi-Model WorkloadsSubjects: Databases (cs.DB)
Modern data analytic workloads increasingly require handling multiple data models simultaneously. Two primary approaches meet this need: polyglot persistence and multi-model database systems. Polyglot persistence employs a coordinator program to manage several independent database systems but suffers from high communication costs due to its physically disaggregated architecture. Meanwhile, existing multi-model database systems rely on a single storage engine optimized for a specific data model, resulting in inefficient processing across diverse data models. To address these limitations, we present M2, a multi-model analytic system with integrated storage engines. M2 treats all data models as first-class entities, composing query plans that incorporate operations across models. To effectively combine data from different models, the system introduces a specialized inter-model join algorithm called multi-stage hash join. Our evaluation demonstrates that M2 outperforms existing approaches by up to 188x speedup on multi-model analytics, confirming the effectiveness of our proposed techniques.