期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2018
卷号:9
期号:11
DOI:10.14569/IJACSA.2018.091196
出版社:Science and Information Society (SAI)
摘要:This study was conducted based on an assumption that Spark ML package has much better performance and accuracy than Spark MLlib package in dealing with big data. The used dataset in the comparison is for bank customers transactions. The Decision tree algorithm was used with both packages to generate a model for predicting the churn proba-bility for bank customers depending on their transactions data. Detailed comparison results were recorded and conducted that the ML package and its new DataFrame-based APIs have better-evaluating performance and predicting accuracy.
关键词:Churn prediction; Big data; Machine learning; Apache Spark; ML package; MLlib package; Decision tree