MDS Computational Linguistics

Tiffany Timbers, Katie Burak

Data Visualization I | DSCI 531

Exploratory data analysis. Design of effective static visualizations. Plotting tools in R and Python.

Instructor(s):

Joel Östblom

Statistical Inference and Computation I | DSCI 552

The statistical and probabilistic foundations of inference. Large sample results. The frequentist paradigm.

Instructor(s):

Supervised Learning I | DSCI 571

Introduction to supervised machine learning. Basic machine learning concepts such as generalization error and overfitting. Various approaches such as K-NN, decision trees, linear classifiers.

Instructor(s):

Varada Kolhatkar

Block 3 (4 weeks, 4 credits)

Corpus Linguistics | COLX 521

Basic processing of text corpora using Python. Includes string manipulation, corpus readers, linguistic comparison of corpora, structured text formats, and text preprocessing tools.

Instructor(s):

Databases & Data Retrieval | DSCI 513

How to work with data stored in relational database systems. Storage structures and schemas, data relationships, and ways to query and aggregate such data.

Instructor(s):

Gittu George

Regression I | DSCI 561

Linear models for a quantitative response variable, with multiple categorical and/or quantitative predictors. Matrix formulation of linear regression. Model assessment and prediction.

Instructor(s):

Katie Burak

Feature and Model Selection | DSCI 573

How to evaluate and select features and models. Cross-validation, ROC curves, feature engineering, and regularization.

Instructor(s):

Joel Östblom

Winter: January - April

Block 4 (4 weeks, 4 credits)

Parsing for Computational Linguistics | COLX 535

The identification of syntactic structure in natural language. Parsing algorithms for popular grammar formalisms, application of statistical information to parsing, parser evaluation, and extraction of parse features.

Instructor(s):

Computational Semantics | COLX 561

How meaning is represented by computers. An overview of popular semantic resources, and techniques for building new resources from unstructured text data.

Instructor(s):

Unsupervised Learning | DSCI 563

How to find groups and other structure in unlabeled, possibly high dimensional data. Dimension reduction for visualization and data analysis. Clustering, association rules, model fitting via the EM algorithm.

Instructor(s):

Supervised Learning II | DSCI 572

Introduction to optimization. Gradient descent and stochastic gradient descent. Roundoff error and finite differences. Neural networks and deep learning.

Instructor(s):

Block 5 (4 weeks + 1 week break, 4 credits)

Advanced Corpus Linguistics | COLX 523

Text corpora collection and curation. How to pull representative datasets from internet sources. Techniques for efficient and reliable annotation.

Instructor(s):

Computational Morphology | COLX 525

Approaches to sub-word phenomenon in language processing. Automatic morphological analysis of diverse languages, part of speech tagging, word segmentation, and character-level neural network models.

Instructor(s):

Machine Translation | COLX 531

Key methodologies for automatic translation between languages, with a focus on statistical and neural machine translation approaches. Applying Machine Translation (MT) architectures to analogous monolingual tasks. MT evaluation.

Instructor(s):

Sentiment Analysis | COLX 565

Identification and analysis of opinion, especially in social media. Text polarity and emotion classification, fine-grained (e.g. aspectual) opinion mining, argumentation mining, sentiment in social networks.

Instructor(s):

Block 6 (4 weeks, 4 credits)

Advanced Computational Semantics | COLX 563

Application of machine learning to various semantic tasks. Likely topics include: information extraction, semantic role labelling, semantic parsing, discourse parsing, question answering, summarization, and natural language inference.

Instructor(s):

Natural Language Processing for Low-Resource Languages | COLX 581

Building automatic language tools when data is scarce. Rule-based and hybrid systems, semi-supervised learning, active learning. Knowledge transfer from other (related) languages.

Instructor(s):

Trends in Computational Linguistics | COLX 585

Cutting-edge techniques in natural language processing. For this iteration, the latest innovations in neural network architectures.

Instructor(s):

Privacy, Ethics & Security | DSCI 541

The legal, ethical, and security issues concerning data, including aggregated data. Proactive compliance with rules and, in their absence, principles for the responsible management of sensitive data. Case studies.

Instructor(s):