Data Science Portfolio

The following contains a portfolio of data science projects completed by me for academic, self learning, and hobby purposes.

  • Machine Learning

    • Bank Campaign Prediction: Trained multiple variations of logistic regression with ridge regularization on the UCI Bank Marketing data-set to predict if a customer would be likely to subscribe to a term deposit or not

    • Feature Engineering Exercise: This implementation attempts to improve model accuracy using only feature engineering techniques and benchmarks the results on Neural Nets, Random Forests, SVM and K Nearest neighbors

    • From Scratch Implementations -

      • Neural Network Classifier from Scratch: Designed and Implemented a neural network classifier with the following architecture
        • Input layer
        • Dense hidden layer with 512 neurons, using relu as the activation function
        • Dropout with a value of 0.2
        • Dense hidden layer with 512 neurons, using relu as the activation function
        • Dropout with a value of 0.2
        • Output layer, using softmax as the activation function and using categorical cross entropy as its loss function and gradient descent using RMSProp to predict the accurate clases on the Fashion MNIST data-set
      • Naive Bayes from Scratch: Designed a spam/ham classifier using a bag of words model and implemented Naive Bayes algorithm from scratch.

      • Decision Trees from Scratch: Designed and implemented decision trees from scratch for image orientation classification and achieved an accuracy of 76%

      • Neural Network from Scratch: Designed and implemented Neural Network from scratch for image orientation classification and achieved an accuracy of 78%

      • K-Nearest Neighbours from Scratch: Designed and implemented decision trees from scratch for image orientation classification and achieved an accuracy of 72%

      • Linear Regression from Scratch: Designed and implemented Linear Regression from scratch for predicting housing prices on the Boston data-set and achieved results similar to Sklearn
  • Artificial Intelligence

    • Blind Search Techniques: This implementation models blind search algorithms in AI like BFS, DFS to find path in a given state space. Similar to N-Queens and Path finding problems.

    • Heuristic Search Techniques: An implementation that models informed search solutions for problems like 15-Puzzle, Route Optimization and Knapsack.

    • Geographical Routing: Implemented heuristic search technique for geographical routing and achieved results similar to Google Maps.

    • Adversarial Games: Implemented similar game to 2048 and designed an AI to play against a human or AI opponent using Minimax Algorithm using empty tile heuristic.

    • CV - Geotagging using Viterbi Algorithm: This implementation tackles a classic problem in computer vision to identify where on Earth a photo was taken using visual features alone (e.g., not using GPS or Geotags). It tries to geolocate such photos by extracting the horizon (the boundary between the sky and the mountains) and using this as a “fingerprint” and match with a digital elevation map to identify where the photo was taken. Here the Viterbi algorithm was used to probabilistically identify the best sequence if pixels that lie on the horizon.

    • NLP - Part of Speech Tagging: One of the first steps towards extracting semantics from natural language text is part of speech tagging.This exercise implements Hidden Markov Model, Hidden Markov Model with Viterbi and Gibbs Sampling in order to identify part of speech for a text.

    • NLP - Code Breaking: Designed and implemented a decryption system using Hidden Markov Model and Metropolis-Hastings Algorithm and achieved a decryption accuracy of 96%

  • Kaggle

    • APTOS 2019 Blindness Detection: This implementation demonstrates a baseline implementation of a CNN (Resnet 50) in order to classify the severity of Diabetic retinopathy