Back to Projects

Kaggle Competitions

Competitive machine learning projects on Kaggle, including regression, classification, and time series challenges

November 2023 View Code
Machine Learning Kaggle Python Data Science Competition

Overview

All of my past competition notebooks are available on my GitHub repository or Kaggle notebooks.

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals. Kaggle competitions offer a fantastic blend of learning resources and practical experience for data scientists - with some generous prizes for those who can come up with novel solutions to real-world problems.

Competition History

Kaggle competitions have been a point of interest ever since learning about the platform. Having acquired the necessary domain knowledge and experience, I’m now able to contribute effectively. Feel free to view my profile.

Regression with a Mohs Hardness Dataset

Result: Top 33% (536/1632) | Kaggle Playground Series S3E25 | November 2023

Use regression to predict the Mohs hardness of a mineral, given its properties. The best approach was a neural network regressor with Adam optimizer.

Binary Prediction of Smoker Status using Bio-Signals

Result: Top 11% (208/1908) | Kaggle Playground Series S3E24 | November 2023

Use binary classification to predict a patient’s smoking status given information about various other health indicators. The best approach employed a combination of publicly accessible AutoML tools and tree-based classifiers to achieve 87.7% accuracy.

Binary Classification with a Software Defects Dataset

Result: Top 37% (618/1702) | Kaggle Playground Series S3E23 | October 2023

Predict if code samples are likely to contain bugs given descriptive statistics. My best-performing approach was an ensemble of tree-based models. If I had more time, I would have liked to incorporate a non-tree classifier to add some diversity to my approach.

Predict Health Outcomes of Horses

Result: Top 19% (290/1541) | Kaggle Playground Series S3E22 | September 2023

Determine horse health outcomes based on descriptive factors. My best-performing approach was a lightly tuned XGBoost model which classified the test set with 75.89% accuracy.

Learning Resources

If you are interested in learning more, I would highly encourage you to listen to the Accelerated data science with a Kaggle grandmaster episode of the Practical AI Podcast. Guest Christof Henkel explains his journey with Kaggle, and how he became a top competitor on the platform.