Recent & Upcoming Talks


There are many challenges to using R models in production. Here’s a few tips to bridge the gap between scripts and deployment.

‘All models are wrong, but some are useful’ is one of the most quoted phrases in data science, but how do we even know if a model is useful? This talk will introduce you to the tools used to validate models. We will also discuss the two most common challenges when it comes to validating a model: data leakage and optimization bias. All examples will be in Python3, Jupyter and Sci-Kit Learn.


Finding the perfect model for your data set is difficult. Models can require dozens of design decisions, as known as hyperparameters. These hyperparameters can interact with each other in unexpected ways. The only way to evaluate these combinations of hyperparameters is by making a model and testing it, which is expensive. Machine learning practitioners often pick combinations by hand, with frustrating. Hyperopt is a Python library that makes hyperparameter optimization automatic. Hyperopt does this by observing previous combinations of hyperparameters and updating its belief which combination of hyperparameters are most like to achieve good results. Hyperopt-sklearn is a wrapper that makes Hyperopt so simple that you could get excellent results in three lines of code.

How do you build machine learning algorithms that scale to 100s of millions of data points? This talk will show you big data strategies to detect fraudulent clicks in China’s largest mobile market.

As data scientists, your time is expensive but computation time is cheap. Matthew Emery will share how to leverage Python libraries to automate basic feature engineering and model selection tasks. Spend more time on the hard problems and let your computer find the best model for your data.


This presentation serves as an introduction to recurrent neural networks and is a review of’s lesson 6.

Google’s BigQuery combines the scalability of Google’s big data infrastructure with the ease of SQL. This lecture we will show you how to set up and use BigQuery

A UBC Master of Data Science alumnus shares his tricks to succeed in the program.

Finding the perfect place to call your new home should be more than browsing through endless listings. RentHop makes apartment search smarter by using data to sort rental listings by quality. But while looking for the perfect apartment is difficult enough, structuring and making sense of all available real estate data programmatically is even harder. Two Sigma and RentHop, a portfolio company of Two Sigma Ventures, invite Kagglers to unleash their creative engines to uncover business value in this unique recruiting competition.


With each passing day telescopes around and above the Earth capture more and more images of distant galaxies. As better and bigger telescopes continue to collect these images, the datasets begin to explode in size. In order to better understand how the different shapes (or morphologies) of galaxies relate to the physics that create them, such images need to be sorted and classified. Kaggle has teamed up with Galaxy Zoo and Winton Capital to produce the Galaxy Challenge, where participants will help classify galaxies into categories.