Accurate Classification for Imbalanced Data Analytics using RSB Ensemble Technique

Authors(3) :-Tanmayee Tushar Parbat, Honey Jain, Rohan Benhal

For the past few years, researchers have developed models using machine learning algorithms which demonstrated unsatisfactory performance on classifying imbalanced datasets. As a remedy, few researchers have experimented with synthetic minority over sampling technique (SMOTE) and Cost Sensitive methods. Results indicated that these methods also have certain drawbacks such as over fitting and high mis-classification rate. To overcome these problems, ensemble techniques were proved to be robust in handling imbalanced datasets. In this study, we have considered existing boosting, bagging ensemble techniques and improved them in several aspects by proposing an algorithm named Random Split bagging (RSB) such that imbalanced datasets can be handled effectively. Our approach presented a novel tuple and attribute selection strategy. Finally we have chosen splitting criteria to generate class label. The proposed Random split bagging ensemble technique shows best performance on classification of minority class examples i.e. classification of disease affected patients like malaria, dengue and jaundice diseases.

Authors and Affiliations

Tanmayee Tushar Parbat
B.E IT, Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra, India
Honey Jain
B.E IT, Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra, India
Rohan Benhal
BBA IT, Dr. Vishwanath Karad MIT World Peace University, Pune, Maharashtra, India

RSB, SMOTE, Imbalanced Dataset

  1. Thomas Smith et al., “Ensemble Modeling of the Likely Public Health Impact of PreErythrocytic Malaria Vaccine”. PLOS Medicine, Vol 9, issue 1, pp. 1-20, 2012.
  2. Razali Tomaria, Wan Nurshazwani Wan Zakaria et al. “Computer Aided System for Red Blood Cell Classification in Blood Smear Image”. International Conference on Robot PRIDE 2013-2014 - Medical and Rehabilitation Robotics and Instrumentation, ConfPRIDE 2013- 2014. ELSEVIER Procedia Computer Science 42. pp. 206–213. 2014.
  3. S. S. Savkare and S.P.Narote. “Automatic System for Classification of Erythrocytes Infected with Malaria and Identification of Parasite's Life Stage”. Procedia Technology 6. pp. 405-410. 2012.
  4. World Malaria Report – 2015. Pages - x, xi. http://apps.who.int/iris/bitstream/10665/200018/1/9789241565158_eng.pdf
  5. Manoj Gambhir and Chathurika Hettiarachchige, “Making sense of consensus: comparative modelling of malaria interventions”. Population Health, IBM Research-Australia, Comment 1, pp.e638-639, 2017.
  6. Yashasvi Purwar, Sirish L Shah et al. “Automated and unsupervised detection of malarial parasites in microscopic images”. Malaria Journal 10:364. pp. 1-10. 2011.
  7. Amnon Shashua. “Introduction to Machine Learning”. Cornell University. ArXiv: 0904.3664v1. Apr. 2009.
  8. Nitesh V.Chawla. “Data Mining for Imbalanced Datasets: An Overview. In: Data Mining and Knowledge Discovery Handbook”. 2nd edn. , pp. 875–886 (2010).
  9. R. Barandela, R. M. Valdovinos et al. “New applications of ensembles of classifiers”. Pattern Anal. App., Vol. 6, pp. 245–256, 2003.
  10. M. Mostafizur Rahman and D. N. Davis. “Addressing the Class Imbalance Problem in Medical Datasets”. International Journal of Machine Learning and Computing, Vol. 3, No. 2.pp 224 - 228. April 2013.
  11. Thomas G Dietterich. “Machine Learning Research Four Current Directions”. Department of Computer Science Oregon State University. pp. 1-47.
  12. Bartosz Krawczyk. “Learning from imbalanced data: open challenges and future directions”. Prog Artif Intell. Springer.pp 1-12. Apr 2016.
  13. N. Poolsawad, C. Kambhampati et al. “Balancing Class for Performance of Classification with a Clinical Dataset”. Proceedings of the World Congress on Engineering. Vol I, pp.1-6.July 2014.
  14. Xing-Ming Zhao et al., “Protein classification with imbalanced data”. PROTEINS - WILEY Inter science. pp. 1125-1132. 2007.
  15. Yazan F. Roumani, Jerrold H. May et al. “Classifying highly imbalanced ICU data”. Springer - Health Care Manag Sci - 16. pp. 119–128. 2013.

Publication Details

Published in : Volume 9 | Issue 7 | September-October 2021
Date of Publication : 2021-10-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 104-108
Manuscript Number : IJSRSET218640
Publisher : Technoscience Academy

Print ISSN : 2395-1990, Online ISSN : 2394-4099

Cite This Article :

Tanmayee Tushar Parbat, Honey Jain, Rohan Benhal, " Accurate Classification for Imbalanced Data Analytics using RSB Ensemble Technique , International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 9, Issue 7, pp.104-108, September-October-2021. Journal URL : https://res.ijsrset.com/IJSRSET218640

Article Preview