# Student Session Speaker Abstracts

Robust Synthetic Control

We present a robust generalization of the synthetic control method for comparative case studies. Like the classical method cf. Abadie and Gardeazabal (2003), we present an algorithm to estimate the unobservable counterfactual of a treatment unit. A distinguishing feature of our algorithm is that of de-noising the data matrix via singular value thresholding, which renders our approach robust in multiple facets: it automatically identifies a good subset of donors for the synthetic control, overcomes the challenges of missing data, and continues to work well in settings where covariate information may not be provided. We posit that the setting can be viewed as an instance of the Latent Variable Model and provide the first finite sample analysis (coupled with asymptotic results) for the estimation of the counterfactual. Our algorithm accurately imputes missing entries and filters corrupted observations in producing a consistent estimator of the underlying signal matrix, provided $$p = \Omega( T^{-1 + \zeta})$$ for some $$\zeta > 0$$; here, $$p$$ is the fraction of observed data and $$T$$ is the time interval of interest. Under the same proportion of observations, we demonstrate that the mean-squared-error in our counterfactual estimation scales as $$\mathcal{O}(\sigma^2/p + 1/\sqrt{T})$$, where $$\sigma^2$$ is the variance of the inherent noise. Additionally, we introduce a Bayesian framework to quantify the estimation uncertainty. Our experiments, using both synthetic and real-world datasets, demonstrate that our robust generalization yields an improvement over the classical synthetic control method.

##### Jackie Baek
Mechanism Design for Airport Landing Slot Exchange

We consider mechanisms in which airlines can exchange landing slots for flights amongst each other. We investigate the "2-for-2 trades" mechanism and analyze its properties including efficiency, incentive compatibility, and fairness.

##### Joseph Emilio Gaudio
Adaptive Control Theory in the Presence of Hard Limits on Magnitude and Rate with Aerospace Applications

This talk presents an adaptive controller for multiple-input-multiple-output (MIMO) plants with input magnitude and rate saturation in the presence of parametric uncertainty and output feedback. Output feedback architectures which account for limits on the magnitude of the control input exist in the adaptive control literature, but not for the case when there are rate limits on the input. A filter placed in the control path accommodates the presence of rate limits, but introduces challenges in matching conditions and the corresponding control design. These challenges are overcome using an output feedback based adaptive controller to account for the increase in the relative degree. The overall control architecture is based on a linearized model, and includes adaptive laws that are modified to account for the magnitude and rate limits. Analytical guarantees of bounded solutions and satisfactory tracking are provided. The performance of the architecture is validated using a numerical model of an aircraft at a single operating point, in the presence of parametric uncertainties.

##### Sagar Indurkhya
Eliminating the Ill-formed Derivations Generated by a Minimalist Grammar

This is a case study on the development of a natural language parser for Modern Standard English (MSE) within the framework of Chomsky's Minimalist Program (Chomsky, 1995). The parser serves as a mapping from sentences to syntactic structures; this mapping has the property that (formal) semantic properties and relations of sentences may be identified via the application of structure dependent rules as in syntax-directed translation for compilers. I consider the problem of developing a Minimalist Grammar that models a fragment of MSE; I then present an example of the non-trivial obstacles one may encounter in this process by pointing out the (over)generation of undesired syntactic structures for certain passive constructions involving Wh-extraction from ditransitive predicates such as "Whom was the ball passed to?". I then demonstrate how to overcome such an obstacle with the aid of the relevant Minimalist theory of language - in this case the necessity of modeling syntactic movement as being driven by the movement-destination rather than the movement source. I then propose a simple modification to the Minimalist Grammar formalism that emulates the movement-destination model of syntactic movement. Finally, I discuss how this case study reaffirms the necessity of incorporating relevant aspects of modern linguistic theory when engineering a robust natural language parser.

##### Bomin Jiang
Contingent Linear Financial Networks

The 2008 world financial crisis has spurred research on systemic risk, and in particular on the estimation of the underlying financial networks governing the propagation of shocks — or as it is sometimes known as financial contagion. Several papers in the literature have concentrated on the different channels of financial contagion. Banks can be interconnected through many channels, though. One type is related to their exposure. A second channel is related to interbank contracts. Finally, banks can be exposed to common shocks such as exchange rate depreciations, inflation rate, interest rates, economic activity, etc. In sum, there are many possible financial instruments that create an interconnection between two banks. As a result, it is likely that there is more than a single network describing the system. Each type of shock is likely to be transmitted through different contracts and the network changes with the shock. For us, this is tantamount to the network being contingent to the shock. Larger shocks and different shocks imply different transmission mechanisms. Those networks are unobservable, but the outcome of all those networks interacting with each other is what can be observed. In other words, the changes in banks balance sheets and financial conditions is the outcome of many interactions acting jointly. The estimation of contingent financial networks becomes not only a computational challenge, but also an identification problem. In this paper we offer a new methodology to estimate contingent financial networks. We concentrate on the type of shock hitting the economy rather than their size — although there is nothing in the methodology that should preclude us from estimating the change in the transmission mechanism contingent on the magnitude of the shock.

##### Sébastien Martin
Online Vehicle Routing — The Edge of Optimization in Large-Scale Applications

With the emergence of ride-sharing companies that offer transportation on demand at a large scale and the increasing availability of corresponding demand datasets, new challenges arise to develop routing optimization algorithms that can solve massive problems in real time. In this talk, we discuss an optimization framework, coupled with a novel backbone algorithm, that allows us to dispatch thousands of taxis serving more than 25,000 customers per hour using the New York City demand data and routing network. We provide evidence from historical simulations to show that our algorithms improve upon the performance of existing heuristics in such real-world settings.

##### Emily Ann Meigs
Learning in Stochastic Routing Games

We consider a repeated nonatomic network routing game where a large number of risk-neutral users are unsure about the edge latency functions and learn about them using past travel experience. We assume that the network has affine stochastic edge latency functions with unknown slope. We consider a simple process of learning where agents share common observations of travel times, estimate the unknown edge slope parameters via ordinary least squares and, at every step, dispatch their traffic demand over the network according to the Wardrop equilibrium computed using mean latency functions with most recent estimates. We prove that under this learning dynamics, the flow vector in the network converges almost surely to the full information Wardrop equilibrium. Moreover, the slope parameters of all the edges used in the full information Wardrop equilibrium are learned almost surely in the limit.

##### Dheeraj Mysore Nagaraj
Comparing Structured and Unstructured Models

Given a probabilistic model, it is important to be able to efficiently describe, analyze and generate independent samples from them. These tasks are easy when the models are unstructured (mean field models) like the Curie-Weiss model, Erdos-Renyi model. But they don't capture much information about the system being modeled. Structured models like the Ising model, Stochastic Block Model, Exponential Random Graph model are in general harder to deal with but give us much higher modeling power. It can be shown that the structured models approximately inherit the characteristics (fast mixing Markov chains, concentration of functionals etc.) of unstructured models in some sense under certain conditions. We analyze various characteristics and the requisite conditions.

##### Zara Perumal
ActiveDefender: An Active Learning Resource Allocation System to Detect Evasive Malware

We present the ActiveDefender Pdf Malware classification system to detect malware using a threshold driven resource allocation decision system. This provides improved accuracy and decreased classification time in a resource-constrained environment. This method provides the highest classification accuracy based on the defense systems required throughput and classification time constraints. Furthermore, we use active learning to improve fast machine learning classifiers to make the decision robust against adversarial evasion techniques. More generally we present a framework that can be used to develop an active optimized resource allocation system in an adversarial environment. The proposed system provides improved accuracy and decreased classification time to previously developed classifiers.

##### Amin Rahimian
Automated Indirect Inference of Social Contagion Processes

Social contagion refers to the spread of innovation or adoption of behavior through peer influence in social networks. Social contagion processes are classified into simple and complex depending on how the probability of adoption increases with the number of adopters in the local neighborhoods (the amount of reinforcement). We formalize this distinction using structural models and apply a deep learning inference architecture to classify such behaviors in a wide range of network structures and against a variety of plausible contagion models.

##### Julia Romanski
Interacting Random Walks

Consider a network of people, each with a certain amount of money. At each time step, one unit of money is chosen uniformly at random, and moves to a neighbor of its owner with a probability that depends positively on how wealthy the neighbors are. Those who are well-connected and initially wealthy will tend to accumulate more wealth. The Interacting Random Walks model generalizes this idea to particles moving around on a graph. Specifically, the probability of a particle moving to a neighboring vertex $$v$$ is proportional to $$e^{\beta x(v)}$$, where $$x(v)$$ is the number of particles at vertex $$v$$. The underlying idea of the model is that quantity is an attractive force. The model is quite flexible, and can be used to model a variety of scenarios. Examples of other application areas for the model, such as those arising in social science, will be given. Theoretical properties of the model, such as the stationary distribution, phase transition, reversibility, and mixing time, are discussed.

##### Jordan Romvary
Optimal Atomic Coordination in Smart Distribution Power Grids

The control and regulation of power grids has historically relied upon large-scale scheduleable generation and relatively stable load demand profiles. With the advent of extensive local renewable energy generation technologies as well as the incorporation of load responsive demand response (DR) methodologies, it has become imperative that new distributed control strategies are developed to better regulate the increasingly volatile nature of modern generation and load profiles. In this talk, we introduce a distributed control strategy called Proximal Atomic Coordination to solve for optimal control strategies in distributed power grids, a problem called Optimal Power Flow (OPF). Using a relaxed variant of OPF, we show that PAC exhibits sub-linear convergence to the optimal ergodic cost, and linear convergence to the OPF solution. We demonstrate our results on various power grid topologies with large levels of renewable energy penetration and DR, and show that PAC converges to the optimal control profiles in these scenarios. We further show that in certain regimes PAC outperforms the standard distributed 2-Block ADMM algorithm, and we discuss the benefits of using PAC over 2-Block ADMM and other standard distributed solvers.

##### Thomas Sayre-McCord
Developing Agile UAV Autonomy in a Virtual Reality Environment

Despite the high level of interest in autonomous Unmanned Aerial Vehicles (UAVs) over the last few years, the gap between human pilots and UAVs without an external infrastructure remains exceedingly large. Autonomous UAVs face limitations both in autonomy algorithms and in the platforms and testing environments required to develop the algorithms. In this talk I will discuss (i) a UAV system built around a TX1 module and a custom carrier board to provide the computation, sensors, and agility required for high performance flight, (ii) a real time photo-realistic image simulation testing environment that acts as a virtual reality environment a UAV while it is in flight, and (iii) the vision based algorithms developed using (i) and (ii) that enable autonomous agile flight.

##### Yee Sian
Statistical Learning for DC Optimal Power Flow

The optimal power flow problem plays an important role in the market clearing and operation of electric power systems. However, with increasing uncertainty from renewable energy operation, the optimal operating point of the system changes more significantly in real-time. In this paper, we aim at developing control policies that are able to track the optimal set-point with high probability. The approach is based on the observation that the OPF solution corresponding to a certain uncertainty realization is a basic feasible solution, which provides an affine control policy. The optimality of this basis policy is restricted to uncertainty realizations that share the same set of active constraints. We propose an ensemble control policy that combines several basis policies to improve performance. Although the number of possible bases is exponential in the size of the system, we show that only a few of them are relevant to system operation. We adopt a statistical learning approach to learn these important bases, and provide theoretical results that validate our observations. For most systems, we observe that efficient ensemble policies constructed using as few as ten bases, are able to obtain optimal solutions with high probability.

##### Thomas Stahlbuhk
Online Learning Algorithms for Minimizing Queue Length Regret

We consider a system consisting of a single transmitter and $$N$$ channels.  Packets randomly arrive to the transmitter's queue, and at each time slot a controller may schedule one of the $$N$$ channels for transmission.  The channel's rates are initially unknown and must be learned through observation.  Our objective is to minimize the number of packets in the system's queue.  We define the regret of the system to be the expected difference between the total queueing delay of a policy that must learn the channels' service rates and a policy that knows the rates, a priori.  A naive approach to solving this problem would be to apply algorithms from the literature that were developed to solve the closely-related stochastic multi-armed bandit problem.  However, we show that these methods have $$\Omega(log(T))$$ regret.  On the other hand, we also show that there exists a set of queue-length based policies that are able to obtain order optimal, $$O(1)$$ regret.

##### Ezra Tal
Continuous Tensor Train-Based Dynamic Programming for High-Dimensional Zero-Sum Differential Games

Zero-sum differential games constitute a prominent research topic in several fields ranging from economics to motion planning under adversarial conditions. Unfortunately, analytical techniques for differential games can address only simple, illustrative problem instances, and most existing computational methods suffer from the curse of dimensionality, i.e., the computational requirements grow exponentially with the dimensionality of the state space. In order to alleviate the curse of dimensionality for a certain class of pursuit-evasion games, we propose a novel dynamic-programming-based algorithm that uses a continuous tensor-train approximation to represent the value function. In this way, the algorithm can represent high-dimensional tensors using computational resources that grow only polynomially with dimensionality of the state space and with the rank of the value function. The proposed algorithm shown to converge to optimal solutions. It is demonstrated in several problem instances; in case of a five-dimensional game, the value function representation was obtained with three orders of magnitude savings in computational and memory cost, when compared to standard value iteration.

##### Jennifer Tang
Parameter Estimation with Unknown Transition Point

In today's information driven world, there is a huge focus on learning parameters from data. Typically, problems involving learning parameters assume that the target parameter does not change in the data set, and while this is true in certain settings, it is not always the case. There are applications where the parameter changes at some unknown time in the data set and only the most recent parameter is of interest to the predictor. In this work, we first show that for an infinite data stream of Bernoulli iid data, when the parameter changes at some unknown transition point, it is possible for what we call a "time-invariant" estimator to achieve asymptotically vanishing error for estimating the most recent parameter. We provide an algorithm which accomplishes this and analyze the algorithm's mean-squared error loss as a function of the transition point. We also expect to quantify what is the inherent loss of not knowing the transition point, providing a lower bound on all time-invariant estimators.

##### Vasileios Tzoumas
Resilient Monotone Submodular Function Maximization

In this paper, we focus on applications in machine learning, optimization, and control that call for the resilient selection of a few elements, e.g. features, sensors, or leaders, against a number of adversarial denial-of-service attacks or failures. In general, such resilient optimization problems are hard and cannot be solved exactly in polynomial time, even though they may involve objective functions that are monotone and submodular. In this paper, we provide for the solution of such optimization problems the first scalable approximation algorithm that is valid for any number of attacks or failures and which, for functions with low curvature, guarantees superior approximation performance. Notably, the curvature has been known to tighten approximations for several non-resilient optimization problems, yet its effect on resilient optimization had hitherto been unknown. We complement our theoretical analyses with empirical evaluations.

##### Rajan Udwani
Multi-Objective Maximization of Monotone Submodular Functions with Cardinality Constraint

The problem of maximizing set functions that are both monotone and submodular has seen a wide array of new applications in the past decade or so. Notably, it has been used to model settings in sensor placement, feature selection, influence maximization etc. In many of these settings there are in fact multiple objectives (all monotone and submodular), or the uncertainty in a single objective is modeled via considering multiple objectives. So given $$m$$ functions, we wish to find a set of size at most $$k$$ that simultaneously achieves a target value for all functions (or give a certificate that no such set exists). Krause et al.(2008) showed that the problem is inapproximable when $$m=\Omega(k)$$ and there is a $$(1-1/e)-\epsilon$$ approximation for constant $$m$$ due to Chekuri et al. (2010). We focus on the case of $$m=o(k)$$ ($$o(k/log^3 k$$) to be precise) and give both, a best possible $$(1-1/e)$$ approximation as well as a fast and practical $$(1-1/e)^2$$ approximation based on Multiplicative-Weight-Updates.

##### Karren D Yang
Learning Equivalence Classes of Causal Graphs under Soft Interventions

Learning causal relations between variables is a fundamental problem of most scientific disciplines. Our work focuses on learning causal graphs in the setting where both observational and interventional data is available. This is often the case in biology, where regulatory networks of genes can be intervened on using chemical agents or gene deletions. Previous work on the identifiability of causal graphs in this setting has been limited to perfect interventions that sever the relations between the targeted variable and its direct causes. However, interventions are seldom perfect in practice; for example, most interventions modify or reduce the relations between a target variable and its direct causes without eliminating them entirely. In this work, we define an equivalence class of structures that is identifiable given data from soft interventions. We propose and prove the consistency of a causal inference algorithm to learn causal graphs in this context and evaluate it on biological datasets.

##### Chulhee Yun
Global Optimality Conditions for Deep Linear Neural Networks

We study the error landscape of deep linear neural networks with the squared error loss. Minimizing the loss of a deep linear neural network is a nonconvex problem, and despite recent progress, our understanding of this loss surface is still incomplete. For deep linear networks, we present necessary and sufficient conditions for a critical point of the risk function to be a global minimum. Our conditions provide an efficiently checkable test for global optimality, which is remarkable because such tests are typically intractable in nonconvex optimization.

##### Martin Zubeldia
Minimizing Delay Through Replication in Distributed Service Systems

We consider a server farm with a very large number of servers working in parallel, to which jobs arrive as a single stream of requests. We assume that the time that it takes for a job to be processed, not only depends on the size of the job itself, but also on random server slowdowns, which can happen for various reasons (e.g., due to background maintenance processes). We assume that a central dispatcher is responsible for routing the incoming jobs to suitable queues, immediately upon arrival. Furthermore, we assume that the dispatcher can create replicas of the jobs and dispatch them to different servers, so that a job is considered completed once any one of its replicas finish processing. At that point, all replicas are immediately deleted. We analyze the best achievable performance, in terms of average delay of a job, and the tradeoffs between the delay and the stability of the system. In particular, we obtain both a theoretical lower bound for the delay, and a practical dispatching/replication policy that matches the lower bound under certain conditions.