PENS Repository

Automatic Metadata Generation by Clustering Extracted Representative Keywords from Heterogeneous Sources

Barakbah, Ali Ridho (2011) Automatic Metadata Generation by Clustering Extracted Representative Keywords from Heterogeneous Sources. Journal of Emitter, 2 (2). pp. 106-113. ISSN 2088-0596

[img] PDF (EEPIS Journal 2011) - Published Version
Restricted to Registered users only
Available under License Creative Commons Attribution No Derivatives.

Download (550Kb)

    Abstract

    In the information retrieval, the generation of important words for metadata space creation is very important to extract representative information from sources. The extraction of important words from a document source which are derived from the intensity of term currently might not represent the original source. In this paper, we propose a new approach to automatically generate a representative metadata by applying a clustering in order to extract representative keywords from heterogeneous sources. The proposed approach consists of three stages: (1) Aggregate Keyword Extraction, (2) Automatic Source Filter, and (3) Representative Keyword Generation. First of all, we extract an aggregate metadata from the all sources of the documents. Secondly, we provide an automatic mechanism to get the selected aggregate metadata by filtering out the sources and acquiring the representative sources by using a set of classifying words. Thirdly, we promote the selected aggregate metadata to be representative metadata. We apply our Hierarchical K-Means to cluster the extracted keywords in order to generate the representative keywords to realize the representative metadata. To perform the applicability of our proposed approach for automatic metadata generation, we conduct an experiment with information sources of Sidoarjo mud flow consisting of 60 English article sources related to Sidoarjo mud flow. The experimental result performs effectiveness of the proposed approach to generate the representative metadata and reduce drastically the metadata space of the keywords.

    Item Type: Article
    Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
    Divisions: Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
    Depositing User: Dr. Ali Ridho Barakbah
    Date Deposited: 22 Mar 2015 12:17
    Last Modified: 22 Mar 2015 12:17
    URI: http://repo.pens.ac.id/id/eprint/2738

    Actions (login required)

    View Item