Automatic Representative News Generation using Automatic Clustering

Puspitasari, Diptia Zandra Eka and Barakbah, Ali Ridho and Winarno, Idris (2012) Automatic Representative News Generation using Automatic Clustering. In: Industrial Electronics Seminar (IES) 2012, 24th October 2012, Surabaya, Indonesia.

PDF (IES 2012 - 2) - Published Version
Restricted to Registered users only
Available under License Creative Commons Attribution No Derivatives.
Download (500Kb)

Abstract

More than 2000 news presented by 32 online news sites in Indonesia in one day, it can make user who don't have enough time to access it being difficult to choose which news that worth enough to read for them because there are news which have same topic and content among of those news. Cluster the news automatically which can provide news representative from all similar news is the best solution to cover news redundancy problem. This paper presents a new approach of automatic representative news generation using automatic clustering. This approach involves 5 steps which are (1) Data Acquisition, (2) Keyword Extraction, (3) Metadata Aggregation, (4) Automatic Clustering, and (5) Representation News Generation. Data Acquisition is used to generate the news from RSS and present the news description that tokenized and filtered in Keyword Extraction Process. Token values, token links, and tokens are the result of Keyword Extraction and inputted into Metadata Aggregation process to provide a matrix of token values of each links. By using Automatic Clustering method, the system can identified the match number of cluster and clustered the news automatically to provide the news representative to the users. The news representation can be found by finding the news which has shortest distance with centroid in each cluster. The results of news representative depend on the token value of each links, if the difference value of cluster is too small, it means that the news are much- separated news, but if the difference value of cluster is too big, that means the news are less-separated news. The longer time that taken as a refresh-time, the automatic clustering results will be more accurate, because the more data that can be formed as a cluster.

Item Type:	Conference or Workshop Item (Paper)
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
Depositing User:	Dr. Ali Ridho Barakbah
Date Deposited:	22 Mar 2015 12:17
Last Modified:	22 Mar 2015 12:17
URI:	http://repo.pens.ac.id/id/eprint/2739

Actions (login required)

View Item