Home arrow Sindicacion
Sindicacion
Knowledge and Information Systems (Online First?)
Articles recently accepted for publication in this journal

  • Rule-based composite event queries: the language XChangeEQ and its semantics

    Abstract  
    Web systems, Web services, and Web-based publish/subscribe systems communicate events as XML messages and in many cases, require composite event detection: it is not sufficient to react to single event messages, but events have to be considered in relation to other events that are received over time. This entails a need for expressive, high-level languages for querying composite events. Emphasizing language design and formal semantics, we describe the rule-based composite event query language XChangeEQ. XChangeEQ is designed to completely cover and integrate the four complementary querying dimensions: event data, event composition, temporal relationships, and event accumulation. Semantics are provided as a model theory with accompanying fixpoint theory, an approach that is established for rule languages but has not been applied to event queries so far. Because they are highly declarative, thus easy to understand and well suited for query optimization, such semantics are desirable for event queries.

    • Content Type Journal Article
    • DOI 10.1007/s10115-010-0334-8
    • Authors
      • Michael Eckert, Institute for Informatics, University of Munich, Oettingenstr. 67, 80538 Munich, Germany
      • François Bry, Institute for Informatics, University of Munich, Oettingenstr. 67, 80538 Munich, Germany


  • Rule induction for uncertain data

    Abstract  
    Data uncertainty are common in real-world applications and it can be caused by many factors such as imprecise measurements, network latency, outdated sources and sampling errors. When mining knowledge from these applications, data uncertainty need to be handled with caution. Otherwise, unreliable or even wrong mining results would be obtained. In this paper, we propose a rule induction algorithm, called uRule, to learn rules from uncertain data. The key problem in learning rules is to efficiently identify the optimal cut points from training data. For uncertain numerical data, we propose an optimization mechanism which merges adjacent bins that have equal classifying class distribution and prove its soundness. For the uncertain categorical data, we also propose a new method to select cut points based on possible world semantics. We then present the uRule algorithm in detail. Our experimental results show that the uRule algorithm can generate rules from uncertain numerical data with potentially higher accuracies, and the proposed optimization method is effective in the cut point selection for both certain and uncertain numerical data. Furthermore, uRule has quite stable performance when mining uncertain categorical data.

    • Content Type Journal Article
    • DOI 10.1007/s10115-010-0335-7
    • Authors
      • Biao Qin, Department of Computer Science, Renmin University of China, Beijing, China
      • Yuni Xia, Department of Computer and Information Science, Indiana University-Purdue University Indianapolis, Indianapolis, IN USA
      • Sunil Prabhakar, Department of Computer Science, Purdue University, West Lafayette, IN USA


  • D-Search: an efficient and exact search algorithm for large distribution sets

    Abstract  
    Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we want to find similar clouds (i.e., distributions) to discover patterns, rules and outlier clouds. For example, consider the numerical case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution of 2-d points (one for each item he/she bought). We want to find similar users, e.g., for market segmentation or anomaly/fraud detection. We propose to address this problem and present D-Search, which includes fast and effective algorithms for similarity search in large distribution datasets. Our main contributions are (1) approximate KL divergence, which can speed up cloud-similarity computations, (2) multistep sequential scan, which efficiently prunes a significant number of search candidates and leads to a direct reduction in the search cost. We also introduce an extended version of D-Search: (3) time-series distribution mining, which finds similar subsequences in time-series distribution datasets. Extensive experiments on real multidimensional datasets show that our solution achieves a wall clock time up to 2,300 times faster than the naive implementation without sacrificing accuracy.

    • Content Type Journal Article
    • DOI 10.1007/s10115-010-0336-6
    • Authors
      • Yasuko Matsubara, Kyoto University, Kyoto, Japan
      • Yasushi Sakurai, NTT Communication Science Labs, Kyoto, Japan
      • Masatoshi Yoshikawa, Kyoto University, Kyoto, Japan


  • One-class learning and concept summarization for data streams

    Abstract  
    In this paper, we formulate a new research problem of concept learning and summarization for one-class data streams. The main objectives are to (1) allow users to label instance groups, instead of single instances, as positive samples for learning, and (2) summarize concepts labeled by users over the whole stream. The employment of the batch-labeling raises serious issues for stream-oriented concept learning and summarization, because a labeled instance group may contain non-positive samples and users may change their labeling interests at any time. As a result, so the positive samples labeled by users, over the whole stream, may be inconsistent and contain multiple concepts. To resolve these issues, we propose a one-class learning and summarization (OCLS) framework with two major components. In the first component, we propose a vague one-class learning (VOCL) module for concept learning from data streams using an ensemble of classifiers with instance level and classifier level weighting strategies. In the second component, we propose a one-class concept summarization (OCCS) module that uses clustering techniques and a Markov model to summarize concepts labeled by users, with only one scanning of the stream data. Experimental results on synthetic and real-world data streams demonstrate that the proposed VOCL module outperforms its peers for learning concepts from vaguely labeled stream data. The OCCS module is also able to rebuild a high-level summary for concepts marked by users over the stream.

    • Content Type Journal Article
    • DOI 10.1007/s10115-010-0331-y
    • Authors
      • Xingquan Zhu, Centre for Quantum Computation and Intelligent Systems, Faculty of Eng. and Information Technology, Univ. of Technology, Sydney, NSW 2007, Australia
      • Wei Ding, Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
      • Philip S. Yu, Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60680, USA
      • Chengqi Zhang, Centre for Quantum Computation and Intelligent Systems, Faculty of Eng. and Information Technology, Univ. of Technology, Sydney, NSW 2007, Australia


  • Mining fastest path from trajectories with multiple destinations in road networks

    Abstract  
    Nowadays, research on Intelligent Transportation System (ITS) has received many attentions due to its broad applications, such as path planning, which has become a common activity in our daily life. Besides, with the advances of Web 2.0 technologies, users are willing to share their trajectories, thus providing good resources for ITS applications. To the best of our knowledge, there is no study on the fastest path planning with multiple destinations in the literature. In this paper, we develop a novel framework, called Trajectory-based Path Finding (TPF), which is built upon a novel algorithm named Mining-based Algorithm for Travel time Evaluation (MATE) for evaluating the travel time of a navigation path and a novel index structure named Efficient Navigation Path Search Tree (ENS-Tree) for efficiently retrieving the fastest path. With MATE and ENS-tree, an efficient fastest path finding algorithm for single destination is derived. To find the path for multiple destinations, we propose a novel strategy named Cluster-Based Approximation Strategy (CBAS), to determine the fastest visiting order from specified multiple destinations. Through a comprehensive set of experiments, we evaluate the proposed techniques employed in the design of TPF and show that MATE, ENS-tree and CBAS produce excellent performance under various system conditions.

    • Content Type Journal Article
    • DOI 10.1007/s10115-010-0333-9
    • Authors
      • Eric Hsueh-Chan Lu, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, ROC
      • Wang-Chien Lee, Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA
      • Vincent S. Tseng, Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, ROC