Home arrow Sindicacion
Sindicacion
Information Retrieval (Online First?)
Articles recently accepted for publication in this journal

  • Introduction to special issue on the second international conference on the theory of information retrieval

    Introduction to special issue on the second international conference on the theory of information retrieval

    • Content Type Journal Article
    • DOI 10.1007/s10791-010-9142-8
    • Authors
      • Leif Azzopardi, Department of Computing Science, University of Glasgow, Glasgow, Scotland, UK
      • Dawei Song, School of Computing, The Robert Gordon University, Aberdeen, Scotland, UK
      • Gabriella Kazai, Microsoft Research Cambridge, Cambridge, UK
      • Stephen Robertson, Microsoft Research Cambridge, Cambridge, UK
      • Stefan Rüger, Knowledge Media Institute, The Open University, Milton Keynes, UK
      • Milad Shokouhi, Microsoft Research Cambridge, Cambridge, UK
      • Emine Yilmaz, Microsoft Research Cambridge, Cambridge, UK


  • Modeling score distributions in information retrieval

    Abstract  
    We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions, individually as well as in pairs, under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the most-likely universal model, with the normal-exponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being ?friendly? to the normal-exponential, and that the non-convexity problem that the mixture possesses is practically not severe. Furthermore, we review recent non-binary mixture models, speculate on graded relevance, and consider methods such as logistic regression for score calibration.

    • Content Type Journal Article
    • DOI 10.1007/s10791-010-9145-5
    • Authors
      • Avi Arampatzis, Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece
      • Stephen Robertson, Microsoft Research, Cambridge, UK


  • Discriminative probabilistic models for expert search in heterogeneous information sources

    Abstract  
    In many realistic settings of expert finding, the evidence for expertise often comes from heterogeneous knowledge sources. As some sources tend to be more reliable and indicative than the others, different information sources need to receive different weights to reflect their degrees of importance. However, most previous studies in expert finding did not differentiate data sources, which may lead to unsatisfactory performance in the settings where the heterogeneity of data sources is present. In this paper, we investigate how to merge and weight heterogeneous knowledge sources in the context of expert finding. A relevance-based supervised learning framework is presented to learn the combination weights from training data. Beyond just learning a fixed combination strategy for all the queries and experts, we propose a series of discriminative probabilistic models which have increasing capability to associate the combination weights with specific experts and queries. In the last (and also the most sophisticated) proposed model, the combination weights depend on both expert classes and query topics, and these classes/topics are derived from expert and query features. Compared with expert and query independent combination methods, the proposed combination strategy can better adjust to different types of experts and queries. In consequence, the model yields much flexibility of combining data sources when dealing with a broad range of expertise areas and a large variation in experts. To the best of our knowledge, this is the first work that designs discriminative learning models to rank experts. Empirical studies on two real world faculty expertise testbeds demonstrate the effectiveness and robustness of the proposed discriminative learning models.

    • Content Type Journal Article
    • DOI 10.1007/s10791-010-9139-3
    • Authors
      • Yi Fang, Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
      • Luo Si, Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
      • Aditya P. Mathur, Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA


  • Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA

    Abstract  
    Probabilistic topic models have recently attracted much attention because of their successful applications in many text mining tasks such as retrieval, summarization, categorization, and clustering. Although many existing studies have reported promising performance of these topic models, none of the work has systematically investigated the task performance of topic models; as a result, some critical questions that may affect the performance of all applications of topic models are mostly unanswered, particularly how to choose between competing models, how multiple local maxima affect task performance, and how to set parameters in topic models. In this paper, we address these questions by conducting a systematic investigation of two representative probabilistic topic models, probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA), using three representative text mining tasks, including document clustering, text categorization, and ad-hoc retrieval. The analysis of our experimental results provides deeper understanding of topic models and many useful insights about how to optimize the performance of topic models for these typical tasks. The task-based evaluation framework is generalizable to other topic models in the family of either PLSA or LDA.

    • Content Type Journal Article
    • DOI 10.1007/s10791-010-9141-9
    • Authors
      • Yue Lu, University of Illinois at Urbana-Champaign Department of Computer Science 201 N Goodwin Ave Urbana IL 61801 USA
      • Qiaozhu Mei, University of Michigan School of Information 1085 South University Ave Ann Arbor MI 48109 USA
      • ChengXiang Zhai, University of Illinois at Urbana-Champaign Department of Computer Science 201 N Goodwin Ave Urbana IL 61801 USA


  • A generative theory of relevance

    A generative theory of relevance

    • Content Type Journal Article
    • DOI 10.1007/s10791-010-9140-x
    • Authors
      • Fernando Diaz, Yahoo! Inc Santa Clara CA USA