Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data
Document Type
Article
Publication Date
5-31-2020
Abstract
With the technical advances, the amount of big data is increasing day-by-day such that the traditional software tools face a burden in handling them. Additionally, the presence of the imbalance data in big data is a massive concern to the research industry. In order to assure the effective management of big data and to deal with the imbalanced data, this paper proposes a new indexing algorithm for retrieving big data in the MapReduce framework. In mappers, the data clustering is done based on the Sparse Fuzzy-c-means (Sparse FCM) algorithm. The reducer combines the clusters generated by the mapper and again performs data clustering with the Sparse FCM algorithm. The two-level query matching is performed for determining the requested data. The first level query matching is performed for determining the cluster, and the second level query matching is done for accessing the requested data. The ranking of data is performed using the proposed Monarch chaotic whale optimization algorithm (M-CWOA), which is designed by combining Monarch butterfly optimization (MBO) [22] and chaotic whale optimization algorithm (CWOA) [21]. Here, the Parametric Enabled-Similarity Measure (PESM) is adapted for matching the similarities between two datasets. The proposed M-CWOA outperformed other methods with maximal precision of 0.9237, recall of 0.9371, F1-score of 0.9223, respectively.
Publication Title
KSII Transactions on Internet and Information Systems
First Page Number
1886
Last Page Number
1908
DOI
10.3837/tiis.2020.05.002
Recommended Citation
Abdalla, Hemn Barzan; Ahmed, Awder Mohammed; and Al Sibahee, M. A., "Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data" (2020). Kean Publications. 1219.
https://digitalcommons.kean.edu/keanpublications/1219