Rony’s Data Science Portfolio
R projects
1. Churn Analysis
Analyze the churn of a major telecom company and identify the key drivers of churn. Then build the best model to predict churn and use the model to reduce churn while maximizing profitability.
2. Motion Classification
Using devices such as Jawbone Up, Nike FuelBand, and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. One thing that people regularly do is quantify how much of a particular activity they do, but they rarely quantify how well they do it. The goal of this project is to predict the manner in which they did the exercise.
3. Multi-classification using a binary classifier
Use the ‘hepatic’ dataset of the AppliedPredictiveModeling library to demonstrate how to classify a multinomial response using a binary classifier. Also, evaluate a knn classifier and find the optimum value of k that yields the highest accuracy.
4. Integrated Decision Tree & Cluster analysis of the Titanic dataset in R and Tableau
Explore the famed Titanic dataset to examine which group had the highest probability of survival. Demonstrate Decision Trees and K-means clustering in R and then move on to Tableau to visualize survival ratios of each cluster. Demonstrate Data retrieval, Data pre-processing, Decision Tree analysis using R and finally integrated K-mean cluster analysis using Tableau/R integration by invoking Rserve()
5. Wordminder - Shiny app that predicts what’s on your mind!
NLP based text prediction web app built using Shiny. This project involved several stages : Cleaning & preprocessing the data and building a text corpus. Creating a document term matrix and tokenizing text. Finding word associations and building an n-gram reference table. Creating a Shiny web text prediction app.
Python projects
1. CRYPTOCURRENCIES: Safe or Toxic?
Since the launch of Bitcoin in 2008, hundreds of similar projects based on the blockchain technology have emerged. We call these cryptocurrencies (also coins or cryptos in the Internet slang). Some are extremely valuable nowadays, and others may have the potential to become extremely valuable in the future1. In fact, the 6th of December of 2017 Bitcoin has a market capitalization above $200 billion. This project will analyze BITCOIN and other cryptocurrenices as an investment vehicle.
2. Risks and Returns
The Sharpe ratio has been one of the most popular risk/return measures in finance, not least because it’s so simple to use. It also helped that Professor Sharpe won a Nobel Memorial Prize in Economics in 1990 for his work on the capital asset pricing model (CAPM). In this projects, we’ll look at the Sharpe ratio by calculating it for the stocks of the two tech giants Facebook and Amazon. As a benchmark, we’ll use the S&P 500 that measures the performance of the 500 largest stocks in the US.
3. Gender Prediction using Sound
The same name can be spelled out in a many ways (for example, Marc and Mark, or Elizabeth and Elisabeth). Sound can, therefore, be a better way to match names than spelling. In this project, you will use the Python package Fuzzy to find out the genders of authors that have appeared in the New York Times Best Seller list for Children’s Picture books.
4. 67 years of LEGO
The Rebrickable database includes data on every LEGO set that ever been sold; the names of the sets, what bricks they contain, what color the bricks are, etc. It might be small bricks, but this is big data! In this project, we will explore the Rebrickable database and analyze a fascinating dataset on every single lego block that has ever been built!
5. Evolution of LINUX
Version control repositories like CVS, Subversion or Git can be a real gold mine for software developers. They contain every change to the source code including the date (the “when”), the responsible developer (the “who”), as well as little message that describes the intention (the “what”) of a change. In this project, we will analyze the evolution of a very famous open-source project – the Linux kernel. The Linux kernel is the heart of some Linux distributions like Debian, Ubuntu or CentOS.