Covering all stages of the data science value chain, UBC’s Okanagan campus Master of Data Science program prepares graduates to thrive in one of the world’s most in-demand fields. Over 10 months, you’ll learn how to extract and analyze data in all its forms, how to turn data into knowledge, and how to clearly communicate your recommendations to decision-makers.
Highlights Across All MDS Programs:
- 10-month, full-time, accelerated program offers a short-term commitment for long-term gain
- Condensed one-credit courses allow for in-depth focus on a limited set of topics at one time
- Capstone project gives students an opportunity to apply their skills
- Real-world data sets are integrated in all courses to provide practical experience across a range of domains
Highlights Specific To Okanagan Campus Option:
- Curriculum is designed by computer science and statistics experts, emphasizing optimization and statistics with a focus on operations research
- Courses are taught by renowned computer science and statistics faculty, giving students access to experts across a broad skill set
- With a cohort limited to 40 students, this program offers a collaborative and intimate learning environment with focus on student success
- The Okanagan campus offers students the opportunity to study at a top 40 university in a smaller setting, situated in a diverse region of natural beauty, and bordering the city of Kelowna, a hub of economic development
- The Okanagan region hosts 2,000 tech start-ups, providing networking and employment opportunities
- Fluency with both open source software and commercial software, including Tableau and Microsoft products (Excel, Azure, SQL Server).
The program structure includes 24 one-credit courses offered in four-week segments. Courses are lab-oriented and delivered in-person with some blended online content.
At the end of the six segments, an eight-week capstone project is also included, allowing students to apply their newly acquired knowledge, while working alongside other students with real-life data sets.
Fall: September - December
Block 1 (4 weeks)
Programming in R and Python including iteration, decisions, functions, data structures, and libraries that are important for data exploration and analysis.
Installation and configuration of data science software. Advanced data analysis using Excel. Analysis of data using libraries in R, Python, and cloud services.
Command line scripting including bash and Linux/Unix. Reporting and visualization.
Pseudorandom number generation, testing and transformation to other discrete and continuous data types. Introduction to Poisson processes and the simulation of data from predictive models, as well as temporal and spatial models.
Block 2 (4 weeks)
Introduction to regression for Data Science, including: simple linear regression, multiple linear regression, interactions, mixed variable types, model assessment, simple variable selection, k-nearest-neighbours regression.
Markov chains and their applications, for example, queueing and Markov Chain Monte Carlo.
How to choose and use appropriate algorithms and data structures such as lists, queues, stacks, hash tables, trees and graphs to solve data science problems. Key concepts include recursion, searching and sorting, and asymptotic complexity.
How to use and query relational SQL and NoSQL databases for analysis. Experience with SQL, JSON, and programming with databases.
Block 3 (4 weeks)
Converting data from the form in which it is collected to the form needed for analysis. How to clean, filter, arrange, aggregate, and transform diverse data types, e.g. strings, numbers, and date-times.
Resampling techniques and regularization for linear models, including Bootstrap, jackknife, cross-validation, ridge regression, and lasso.
The legal, ethical, and security issues concerning data, including aggregated data. Proactive compliance with rules and, in their absence, principles for the responsible management of sensitive data. Case studies.
How to exploit practices from collaborative software development techniques in data scientific workflows. Appropriate use of abstraction and classes, the software life cycle, unit testing / continuous integration, quality control, version control, and packaging for use by others.
Winter: January - April
Block 4 (4 weeks)
Data visualization to produce effective graphs and images. Use of open source libraries in Python and R and commercial products such as Tableau.
Fundamental techniques in the collection of data. Focus will be devoted to understanding the effects of randomization, restrictions on randomization, repeated measures, and blocking on the model fitting.
How to use the web as a platform for data collection, computation, and publishing. Accessing data via scraping and APIs. Using the cloud for tasks that are beyond the capability of your local computing resources.
Introduction to supervised machine learning. Key concepts include: logistic regression, k-nearest-neighbours classification, discriminant analysis, decision trees and random forests.
Block 5 (4 weeks + 1 week break)
How to present and interpret data science findings. Drawing on the scholarship of language and cognition, this course is about how effective data scientists write, speak, and think.
Advanced study in predictive modelling techniques and concepts, including multiple linear regressions, splines, smoothing, and generalized additive models.
How to analyse data with unknown responses. Distance measures, hierarchical clustering, k-means, mixture models.
Advanced concepts in data visualization, using business intelligence and data analysis software. Key concepts include interactive visualization and production of visualizations for mobile and web.
Block 6 (4 weeks)
Introduction to Bayesian paradigm and tools for Data Science. Topics include Bayes theorem, prior, likelihood and posterior. A detailed analysis of the cases of binomial, normal samples, normal linear regression models. A significant focus will be on computational aspects of Bayesian problems using software packages.
Modeling using mathematical programming. Key concepts include fundamental continuous and discrete optimization algorithms; optimization software for small to medium scale problems; and optimization algorithms for data science.
Advanced or specialized topic in Data Science with applications to specific data sets. Analysis of Big Data using Hadoop and Spark.
Advanced machine learning methods and concepts, including neural networks, backpropagation, and deep learning.
Spring: May - June
Capstone Project (8-10 weeks)
DSCI 591 (MDS Vancouver) / DATA 599 (MDS Okanagan)
A mentored group project based on real data and questions from a partner within or outside the university. Students will formulate questions and design and execute a suitable analysis plan. The group will work collaboratively to produce a reproducible analysis pipeline, project report, presentation and possibly other products, such as a dashboard.