I am Dmitry, a Data Scientist

A scientist is someone who finds the problem to solve and methods to solve it.

Are you wondering why your subscribers leave?
Or you have collected data to predict how your customers will behave, but you don't know where to start?
Do you have a new product, but your landing page doesn't convince potential customers to try it out?
Or maybe you want to recommend products to your customers that they will definitely like?

I take ideas, domain knowledge and data, and turn them into insights and solutions

I carry along a unique combination of expertise

Data mining and visualization

It takes careful planning and thorough cleaning to get a neat dataset ready for analysis. I know how to clean dirty data and save weeks of working time with parallelized data analysis pipelines.

Predictive modeling and machine learning

I can build a quick insightful model when you need to have better predictions or want to know which side of operation needs more attention. I also know how to develop a sophisticated and fine-tuned model that will power the decisions with great predictive accuracy.

Software Development

Shaping an idea into a working piece of software requires knowledge, experience and hard work. My software developing skills, experience and intuition are always ready for new challenges.

Neuroscience and analytic challenges

What have I learned about "big data" analysis in neuroscience?

Delivery time: want today, not next week.
Interactivity: want to try and inspect multiple models and approaches quickly.
Fault tolerance: want automated process, minimal supervision, scale out.

Data preprocessing

  • Input: 20-50gb of 3D brain images
  • Output: 400-1000gb of clean and organized 3D matrices
  • Time spent normally: 2h per 1gb of input
  • My achievement: 30m/whole dataset via parallelized workflow

Click to see my last project on Git

Stack: scikit-learn, rpy2, Bayesian CCA, logistic regression (here vanilla, for publication I used Bayesian with Laplace prior, similar to L1 regularization)

Data analysis

  • Extract features
  • Build analytic pipelines and find best parameters via crossvalidation
  • My achievement: developed several packages to simplify workflow and allow reproducibility of analyses with minimum effort

Reports and visualization

  • Synthesize key outcomes
  • Create intuitive visualizations
  • My achievement: developed scripts for quicker inspection and evaluation of results, significantly reducing time spent on prototyping the best visualization

Bayesian Statistics. Enhance statistical inference with prior knowledge.

Bayesian methods allow building models that include prior knowledge and make sense of models that are otherwise too complicated to be solved analytically. In example presented here I provide a tutorial that analysed real data on moral judgment of American and Russian individuals. Similar approach can naturally be expanded to A/B testing providing more accurate estimates.

Click to take a look at this example notebook

Stack: PyMC3, plot.ly, matplotlib

Predictive modeling: Churn

Subscription-based enterprises rely on long-lasting clients.

My experience with logistic regression and ensemble methods, such as random forest, can serve you to pinpoint the problematic service areas that lead to churn, and suggest clients that are likely to leave soon and may require additional attention.

Use confusion matrix to check which classes were mislabeled.
Learn which features contributed the most to predict that individual will churn.
ROC curve can be used to assess model performance.

Here is a model that successfully predicts individuals who are going to churn, and reveals which aspects of subscription may increase chance of churning.

Click to check the source code on Git!

Classifier used: Random Forest
Predictive accuracy: 95%
Stack: pandas, scikit-learn, plot.ly

Interactive visualization: significantly more data to explore than on a static image.

Stack: Flask, Pandas, PostgreSQL, D3.js. Click on region to zoom.

Professional Experience

Jan 2011
  • Aalto University, School of Science, BECS
  • Internationally acclaimed laboratory, performing cutting-edge neuroscientific research.
  • 11/01/2011 - Present

  • PhD Candidate
    • Successfully accomplished multiple projects studying human brain functioning in complex naturalistic environment using fMRI, machine learning and predictive modeling techniques.
    • Significantly reduced time costs for running data preprocessing and analyzes on terabytes of neural data by developing parallelized utilities and pipelines in Matlab, R and Python.
    • Assisted in teaching Experimental and Statistical Methods in Biomedical Research using R programming language. Supervised student experimental projects and provided tutoring.
    • Presented scientific results in talks and posters on multiple international conferences.
Dec 2009
  • Innova Systems
  • Major online entertainment provider in Russia, CIS, Europe and Brazil, which has been on the market for over 7 years and has published multiple successful MMORPGs.
  • 1/12/2009 - 18/11/2010

  • Customer support manager (Game Master)
    • Supported player community in technical and gaming problems
    • Increased rate of support department response by an order of magnitude by optimization of support system, integrating it with user community and creating process guidelines for effective communication between working groups.
    • Decreased rate of user problem reports by creating guidelines for solving most common issues.
    • Decreased connectivity issues experienced by players by conducting a research of common causes of network problems and preparing solutions together with network engineering group.
Oct 2007
  • Lomonosov Moscow State University, Work Psychology Lab
  • Top-rated institution for higher education
  • 14/10/2007 - 9/05/2009

  • Research assistant
    • Supported laboratory in technical aspects.
    • Assisted in experimental design and statistical analysis.
    • Presented laboratory achievements at exhibitions and conferences.
    • Provided support in performing eye-tracking and psychological experiments.

Let's Keep In Touch!

If you need someone with strong resolve, versatile technology stack and approachable personality - contact me:

email: dmi.smirnov07 [at] gmail.com
phone: +358503015072