Home arrow Sindicacion
Sindicacion
ScienceDirect Publication: Information Processing & Management
ScienceDirect RSS

ScienceDirect
  • Exploiting probabilistic topic models to improve text categorization under class imbalance
    Publication year: 2010
    Source: Information Processing & Management, In Press, Corrected Proof, Available online 1 September 2010

    Enhong, Chen , Yanggang, Lin , Hui, Xiong , Qiming, Luo , Haiping, Ma

    In text categorization, it is quite often that the numbers of documents in different categories are different, i.e., the class distribution is imbalanced. We propose a unique approach to improve text categorization under class imbalance by exploiting the semantic context in text documents. Specifically, we generate new samples of rare classes (categories with relatively small amount of training data) by using global semantic information of classes represented by probabilistic topic models. In this way, the numbers of samples in different categories can become more balanced and the performance of text categorization can be improved using this transformed data set. Indeed,...

     Research highlights: ? Propose two re-sampling methods based on probabilistic topic models. ? Improve text categorization under class imbalance. ? DECOM and DECODER achieve better performance under class imbalance. ? DECODER is more tolerant to noisy samples.



  • Patterns of bibliographic references in the ACM published papers
    Publication year: 2010
    Source: Information Processing & Management, In Press, Corrected Proof, Available online 25 August 2010

    Jacques, Wainer , Henrique, Przibisczki de Oliveira , Ricardo, Anido

    This paper analyzes the bibliographic references made by all papers published by ACM in 2006. Both an automatic classification of all references and a human classification of a random sample of them resulted that around 40% of the references are to conference proceedings papers, around 30% are to journal papers, and around 8% are to books. Among the other types of documents, standards and RFC correspond to 3% of the references, technical and other reports correspond to 4%, and other Web references to 3%. Among the documents cited at least 10 times by the 2006 ACM papers, 41% are conferences...

     Research highlights: ? Conferences amounts to around 40% of the references in the CS papers. ? Journal papers amounts to 30% of the references. ? 41% of the papers cited more than 10 times are from conferences. ? 37% of the documents cited more than 10 times are books.



  • Intelligent agent systems for executive information scanning, filtering and interpretation: Perceptions and challenges
    Publication year: 2010
    Source: Information Processing & Management, In Press, Corrected Proof, Available online 11 August 2010

    Mark, Xu , Vincent, Ong , Yanqing, Duan , Brian, Mathews

    Using intelligent agent-based systems to support information processing for executives has not been significantly advanced in both theory and practice. Research into this field tends to focus more on technical aspects than on social perspective. When executives are faced with increasing information availability and uncertainty in the business environment, using intelligent agent-based systems to enhance executives? information processing capability appears both an opportunity and a necessity. This study examines UK executives? perceptions of intelligent agent-based systems for information scanning, filtering, interpretation and alerting. The study follows a deductive research design, i.e. hypothesis formulation and testing from the user?s perspective. Qualitative...


  • Linguistic kernels for answer re-ranking in question answering systems
    Publication year: 2010
    Source: Information Processing & Management, In Press, Corrected Proof, Available online 20 July 2010

    Alessandro, Moschitti , Silvia, Quarteroni

    Answer selection is the most complex phase of a question answering (QA) system. To solve this task, typical approaches use unsupervised methods such as computing the similarity between query and answer, optionally exploiting advanced syntactic, semantic or logic representations.In this paper, we study supervised discriminative models that learn to select (rank) answers using examples of question and answer pairs. The pair representation is implicitly provided by kernel combinations applied to each of its members. To reduce the burden of large amounts of manual annotation, we represent question and answer pairs by means of powerful generalization methods, exploiting the application of...


  • Brain CT image database building for computer-aided diagnosis using content-based image retrieval
    Publication year: 2010
    Source: Information Processing & Management, In Press, Corrected Proof, Available online 15 July 2010

    Kehong, Yuan , Zhen, Tian , Jiying, Zou , Yanling, Bai , Qingshan, You

    Content-based image retrieval for medical images is a primary technique for computer-aided diagnosis. While it is a premise for computer-aided diagnosis system to build an efficient medical image database which is paid less attention than that it deserves. In this paper, we provide an efficient approach to develop the archives of large brain CT medical data. Medical images are securely acquired along with relevant diagnosis reports and then cleansed, validated and enhanced. Then some sophisticated image processing algorithms including image normalization and registration are applied to make sure that only corresponding anatomy regions could be compared in image matching. A...