期刊名称:International Journal of Innovative Research in Computer and Communication Engineering
印刷版ISSN:2320-9798
电子版ISSN:2320-9801
出版年度:2017
卷号:5
期号:3
页码:5971
DOI:10.15680/IJIRCCE.2017.0503347
出版社:S&S Publications
摘要:A huge amount of data containing useful information, called Big Data, is generated on a daily basis. Forprocessing such tremendous volume of data, there is a need of Big Data frameworks such as Hadoop MapReduce,Apache Spark etc. Among these, Apache Spark performs up to 100 times faster than conventional frameworks likeHadoop Mapreduce. we focus on the design of partitional clustering algorithm and its implementation on ApacheSpark. In this paper, we propose a partitional based clustering algorithm called Scalable Random Sampling withIterative Optimization Fuzzy c-Means algorithm (SRSIO-FCM) which is implemented on Apache Spark to handle thechallenges associated with Big Data Clustering. Experimentation is performed on several big datasets to show theeffectiveness of SRSIO-FCM in comparison with a proposed scalable version of the Literal Fuzzy c-Means (LFCM)called SLFCM implemented on Apache Spark.
关键词:Apache Spark; Big Data; SRSIO-FCM; LFCM; SLFCM.