Student Talk Abstracts

Xiaoyue Gong
A Fast Max Flow Algorithm

In 2013, Orlin proved that the max flow problem could be solved in \(\mathcal{O}(nm)\) time. His algorithm ran in \(\mathcal{O}(nm + m^{1.94})\) time, which was the fastest for very sparse graphs. If the graph was not sufficiently sparse, the fastest running time was an algorithm due to King, Rao, and Tarjan. We describe a new variant of the excess scaling algorithm for the max flow problem whose running time strictly dominates the running time of the algorithm by King, Rao, and Tarjan.

Jingzhao Zhang
Direct Runge-Kutta Discretization Achieves Acceleration

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. When the function is smooth enough, we show that acceleration can be achieved by a stable discretization of this ODE using standard Runge-Kutta integrators. Specifically, we prove that under Lipschitz-gradient, convexity and order-\((s+2)\) differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of \(\mathcal{O}(N^{-2\frac{s}{s+1}})\), where s is the order of the Runge-Kutta numerical integrator. Furthermore, we introduce a new local flatness condition on the objective, under which rates even faster than \(\mathcal{O}(N^{−2})\) can be achieved with low-order integrators and only gradient information. Notably, this flatness condition is satisfied by several standard loss functions used in machine learning. We provide numerical experiments that verify the theoretical rates predicted by our results.

Konstantina Mellou
Dynamic Redistribution of Bike Sharing Systems

Spatial and temporal imbalances of the user demand of bike sharing systems often lead to empty and full bike stations. Since reliable service is essential for the viability of these systems, operators use a fleet of vehicles to redistribute bikes across the network. We propose a mixed integer programming formulation for the dynamic version of the problem, and, combined with heuristics and decomposition techniques, we are able to solve large real-world instances. Since accurate estimation of the user demand is essential to plan for efficient rebalancing, we also develop approaches to consider lost and shifted demand, which very often are not taken into account.

Alireza Fallah
A Universally Optimal Multistage Accelerated Stochastic Gradient Method

We study the problem of minimizing a strongly convex and smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic version of Nesterov’s accelerated algorithm with a specific restart and parameters selected to achieve the fastest reduction in the bias-variance terms in the convergence rate bounds.

Dheeraj M. Nagaraj
Making Last Iterate of SGD Information Theoretically Optimal

Stochastic Gradient Descent is the most widely used optimization method in machine learning. The theoretical guarantees are given for the average of the iterates whereas in practice only the last iterate of SGD is used. It can be shown that standard step size sequences used in theory can never achieve information theoretically optimal error rates for the last iterate when the objective functions are not smooth. We give modified step size sequences which give the information theoretically optimal error guarantees of \(\mathcal{O}(1/\sqrt{T})\) in the general case and \(\mathcal{O}(1/T)\) in the strongly convex case. More generally, we give a modification scheme for any step size sequence such that the error guarantees for the average of the iterates for the original step sizes hold for the last iterate when the modified step sizes are used.

Ryan Cory-Wright
A Scalable Algorithm for Sparse and Robust Portfolios

A central problem in financial economics concerns investing capital to maximize a portfolio’s expected return while minimizing its variance, subject to an upper bound on the number of positions, minimum investment and linear inequality constraints, at scale. Existing approaches to this problem do not provide provably optimal portfolios for real-world problem sizes with more than 300 securities. In this paper, we present a cutting-plane method which solves problems with 1000s of securities to provable optimality, by exploiting a dual representation of the continuous Markowitz problem to obtain a closed form representation of the problem’s subgradients. We improve the performance of the cutting-plane method in three different ways. First, we implement a local-search heuristic which applies our subgradient representation to obtain high-quality warm-starts. Second, we embed the local-search heuristic within the cutting-plane method to accelerate recovery of an optimal solution. Third, we exploit a correspondence between the convexified sparse Markowitz problem and a rotated second-order cone problem to obtain a tight lower bound which is sometimes exact. Finally, we establish that the cutting-plane method is 3—4 orders of magnitude more efficient than existing methods, and construct provably optimal sparsity-constrained frontiers for the S&P; 500, Russell 1000, and Wilshire 5000.

Renbo Zhao
An Inexact Primal-Dual Smoothing Framework for Large-scale Non-Bilinear Saddle Point Problems

We propose an inexact primal-dual smoothing framework to solve a strongly-convex-generally-concave saddle point problem with non-bilinear structure, with potentially a large number of component functions. We develop a probabilistic version of our smoothing framework, which allows each sub-problem to be solved by randomized algorithms inexactly in expectation. In addition, we extend both our deterministic and probabilistic framework s to solve generally convex-concave saddle point problems. Compared to the existing approaches, our frameworks enjoy superior first-order oracle complexities, and can easily exploit additional regularities in the problem. As an important application, we illustrate the efficacy of our framework s on the large-scale convex constrained optimization problems.

Arthur Delarue
From School Buses to Bell Times: Driving Policy with Optimization

Spreading start times allows school districts to reduce transportation costs by reusing buses between schools. However, assigning each school a time involves both estimating the impact on transportation costs and reconciling additional competing objectives. These challenges force many school districts to make myopic decisions, leading to an expensive and inequitable status quo. For instance, most American teenagers start school before 8, despite evidence of significant associated health issues. We propose the first algorithm to jointly solve the school bus routing and bell time selection problems. Our application in Boston led to $5 million in yearly savings (maintaining service quality despite a 50-bus fleet reduction) and to the unanimous approval of the first school start time reform in thirty years.

Enric Boix
Clustering Networks with Graph Powering

I will present a robust community detection algorithm which is based on spectral methods and on a regularization operation known as "graph powering". The algorithm recovers the hidden communities of the Stochastic Block Model in the sparse regime, whenever recovery is information-theoretically possible. Moreover, the algorithm is robust, because it also recovers the hidden communities of the Geometric Block Model -- a random graph model with many cliques and short cycles, which caused previous SBM clustering algorithms based on belief propagation and on the non-backtracking operator to fail. This is joint work with Emmanuel Abbe, Peter Ralli, and Colin Sandon.

Espen Flo Bodal
Investment in Power Systems with Short-Term Uncertainty

Modelling storage strategies in combination with uncertainty in renewable energy production and power flow on transmission lines is a challenging task. To be able to optimize the operation of the power system considering all these factors and still get a model that is simple enough to investigate investments with limited computational resources we use a rolling horizon approach. Sensitivities with respect to installed capacity is collected from the operational model and used to make investment decisions in an investment model.

Emily Meigs
Optimal Recommendations in Routing

A model of how routing applications (e.g. Waze, Google Maps) should optimally provide information to users to encourage necessary experimentation, while achieving a lower cost (than cost under equilibrium) and maintaining incentive compatibility.

Vishrant Tripathi
Age Optimal Information Gathering and Dissemination on Graphs

We discuss the problem of timely exchange of updates between a central station and a set of ground terminals \(V\), via a mobile agent that traverses across the ground terminals along a mobility graph \(G = (V;E)\). We design the trajectory of the mobile agent to minimize peak and average age of information (AoI), two newly proposed metrics for measuring timeliness of information. We consider randomized trajectories, in which the mobile agent travels from terminal \(i\) to terminal \(j\) with probability \(P_{i;j}\). We show that a randomized trajectory is peak age optimal and factor-8H average age optimal, where H is its mixing time. We also show that the average age minimization problem is NP-hard. Moreover, we propose an age-based trajectory, which utilizes information about current age at terminals, and show that it is factor-2 average age optimal in a symmetric setting.

James Siderius
Bayesian Learning in Awkward-sized Groups

We consider a noncooperative model of social learning where agents receive independent signals about a binary state of the world. Agents communicate their beliefs through an (unknown) social network over a finite time horizon. Such a limitation can be viewed as a cost of social interaction or a time-constraint on making a decision. Our main finding is that the extent of learning can be non-monotone in the group size. In particular, when groups are small or large, there is less uncertainty about the amount of overlap in commonly observed information. In mid-sized groups, agents are less certain of the magnitude of an "echo chamber" effect, which dampens the power of inference.

Jingwei Yang
On the Equivalent Representation of General Energy Networks: An Electrical-analog Perspective

Networks are the most important carriers in large-scale energy systems, such as power systems, natural gas system and district heating systems. However, the energy flow in different networks are governed by different physical laws, which makes them hard to be represented in a uniform framework. It is an interesting question that if any similarity exists between these energy networks. This work uses an electrical-analog perspective on the PDEs that describe these energy networks. A linear circuit model in both distributed and lumped parameter is proposed for the general energy network. Future application is envisioned on the analysis, integration and coordination between multiple energy systems.

Rupamathi Jaddivada
Toward Economically-efficient and Technically-realizable Operation of Electric Power Systems

The proliferation of intermittent, renewable technologies combined with the customer service-oriented trends present new fundamental challenges. Given temporal criticality of operating the physical grid, it is essential to process vast sources of asynchronously gathered data online and use for multi-layered interactive automation at well-understood expected performance. Today's Supervisory and Data Acquisition (SCADA) architecture capable of enabling electricity service is limited to hierarchical temporal spatial assumptions that no longer hold. SCADA must evolve into flexible architecture capable of aligning end-to-end physical and economic processes. Our approach provides a means of multi-layered modeling of interactions in terms of commonly-understood energy and power variables characterizing multi-physical energy conversion processes. This modeling framework allows for the zooming in and out of aggregate models as per the spatial and temporal granularity of interest. The most novel finding so far: such multi-layered modeling with intra- and inter-layer interactions not only eases the component-level control design, but also results in a linear interactive model for aggregate-level analysis, thus allowing for end-device participation at value. The approach uses formal mathematical techniques called spatial and temporal lifting that facilitate abstraction of inner component details, revealing only the interface variables (energy and power) to its neighbors and its time-varying limits to its coordinator as a function of economic signals that are learned by the component itself or communicated by its coordinating aggregator. Finally, this framework induces a method for systematic practical composition of an otherwise complex computer platform for simulating technical and economic information exchange within a future electric-energy system. Ultimately, the modeling framework-supported control design demonstrating the effects of end-to-end market transactions on the power grid is being shown using our home-grown scalable electric power system simulator (SEPSS) to result in efficient and technically realizable integration of renewables and near-optimal provision of electricity services.

Anish Agarwal
Model Agnostic Time Series Analysis via Matrix Estimation

We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and de-noise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) low-rank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with non-overlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la a Hidden Markov Model; e.g., recovering the time-varying parameter of a Poisson process without knowing that the underlying process is Poisson. Furthermore, since our forecasting algorithm requires regression with noisy features, our approach suggests a matrix estimation based method—coupled with a novel, non-standard matrix estimation error metric—to solve the error-in-variable regression problem, which could be of interest in its own right. Through synthetic and real-world datasets, we demonstrate that our algorithm outperforms standard software packages (including R libraries) in the presence of missing data as well as high levels of noise.

Anastasiya Belyaeva
Mathematics of 3D Genome Organization

The 3D structure of the genome plays a key role in regulatory control of the cell. Genome sequencing methods have been developed to measure contact frequencies of DNA at the whole-genome level in order to probe the 3D configuration of the genome. However, it remains a challenge to deduce from this data the 3D spatial organization of genomic regions. In this talk, we discuss statistical methods for discovering colocalized genomic regions via large average submatrix algorithm and provide a novel optimization formulation for recovering the 3D embedding of DNA for diploid organisms from noisy measurements.

Dogyoon Song
Learning Mixtures with Missing Entries and Its Application to Rankings

We consider the question of learning mixtures of generic sub-gaussian distributions based on obser- vations with missing values. To that end, we utilize a matrix estimation method from literature (soft- or hard- singular value thresholding). Specifically, we stack the observations (with missing values) to form a data matrix and learn a low-rank approximation of it so that the row indices can be correctly clustered to belong to appropriate mixture component using a simple distance-based algorithm. To analyze the performance of this algorithm by quantifying finite sample bound, we extend the result for matrix estimation methods in the literature in two important ways: one, noise across columns is correlated and not independent across all entries of matrix as considered in the literature; two, the performance metric of interest is the maximum l2 row norm error, which is stronger than the traditional mean-squared-error averaged over all entries. Equipped with these advances in the context of matrix estimation, we are able to connect matrix estimation and mixture model learning in the presence of missing data.

Jackie Baek
Detecting Early Stage Cancer Using Liquid Biopsy

The cost of DNA sequencing has fallen 10,000x in the last ten years, and we are finally in sight of the silver bullet for cancer screening: an early-stage blood test. As sequencing can now be performed affordably on a tiny fraction of a genome, what remains is a massive variable selection problem. We provide an efficient algorithm, based on a decomposition at the gene level, that scales to full genomic sequences across thousands of patients for cancer detection and classification.

Chulhee (Charlie) Yun
Small Nonlinearities in Activation Functions Create Bad Local Minima in Neural Networks

We investigate the loss surface of neural networks. We prove that even for one-hidden-layer networks with "slightest" nonlinearity, the empirical risks have spurious local minima in most cases. Our results thus indicate that in general "no spurious local minima" is a property limited to deep linear networks, and insights obtained from linear networks are not robust. Specifically, for ReLU(-like) networks we constructively prove that for almost all (in contrast to previous results) practical datasets there exist infinitely many local minima. We also present a counterexample for more general activations (sigmoid, tanh, arctan, ReLU, etc.), for which there exists a bad local minimum. Our results make the least restrictive assumptions relative to existing results on local optimality in neural networks. We complete our discussion by presenting a comprehensive characterization of global optimality for deep linear networks, which unifies other results on this topic.

Matthew S. Brennan
Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure

The prototypical high-dimensional statistics problem entails finding a structured signal in noise. Many of these problems exhibit an intriguing phenomenon: the amount of data needed by all known computationally efficient algorithms far exceeds what is needed for inefficient algorithms that search over all possible structures. A line of work initiated by Berthet and Rigollet in 2013 has aimed to explain these statistical-computational gaps by reducing from conjecturally hard average-case problems in computer science. However, the delicate nature of average-case reductions has limited the applicability of this approach. In this work we introduce several new techniques to give a web of average-case reductions showing strong computational lower bounds based on the planted clique conjecture using natural problems as intermediates. These include tight lower bounds for Planted Independent Set, Planted Dense Subgraph, Sparse Spiked Wigner, Sparse PCA, a subgraph variant of the Stochastic Block Model and a biased variant of Sparse PCA. We also give algorithms matching our lower bounds and identify the information-theoretic limits of the models we consider.

Omer Tanovic
Causally-stable Approximation of Optimal Maps in Maximal Value Constrained Least-Squares Optimization

We consider a problem of designing discrete-time systems which are optimal in frequency-weighted least squares sense subject to a maximal output amplitude constraint. In particular, such optimization problems serve to represent a number of peak-to-average-power ratio (PAPR) reduction objectives which are of significant importance in modern communication systems. In such problems, the optimality conditions do not provide an explicit way of generating the optimal output as a real-time implementable transformation of the input, due to causal instability of the resulting dynamical equations and sequential nature in which the criterion function is revealed over time: at each time instance, knowledge of whole history of the input signal should be known ahead of time in order to calculate the current sample of the optimal output signal. Due to difficulties in obtaining an explicit optimal solution, receding horizon optimization, i.e. model predictive control (MPC), appears to be a natural way of addressing these problems. Unfortunately, the high cost associated with MPC computations at every time step makes it unfavorable in power and time-sensitive applications such as those in signal processing for communication systems. We first show that the optimal system has exponentially fading memory, which suggests existence of arbitrarily good receding horizon (i.e., finite latency) approximations. We propose a real-time realizable algorithm which, under an L1 dominance assumption about the equation coefficients, returns high-quality approximations to the optimal map, where the approximation quality is measured in terms of the supremum norm of the error signal. The algorithm exploits the optimality conditions and is realized as a causally stable nonlinear discrete-time system, which is allowed to look ahead at the input signal over a finite horizon (and is, therefore, of finite latency). Furthermore, we propose an extension of the well-known method of balanced truncation for linear systems to the class of nonlinear models with weakly contractive operators. We then propose a causally stable finite-latency nonlinear system which also returns high-quality approximations to the optimal map, where the approximation quality is now measured in terms of the L2 gain of the error system. In this case, the algorithm does not depend on any special assumptions about the least squares criterion (or, equivalently, about the system equation coefficients), except for convexity. The proposed system is obtained by a careful truncation of an infinite-dimensional state space model representation of the optimal system, as suggested by the derived generalization of the balanced truncation algorithm. The results are illustrated on problems of envelope tracking and PAPR reduction of communication signals, both stemming from power-efficient transceiver design in modern digital communication systems.

Karthik Gopalakrishnan
Control of Switched Network Models

We propose a switched network model to characterize the dynamics of air traffic delay propagation. Our model assumes a linear dynamics for a fixed network topology, and Markov transitions between the candidate topologies. This framework is general enough and can be used to model several networked process that involve varying topologies. We will motivate the control problem in this context, and argue the importance of controlling both the nodal state as well as the Markov chain that governs the network transitions. We conclude by presenting some control heuristics for this model and examples.

Sadra Sadraddini
Polytopic Trees for Verification and Control of Hybrid Systems

Hybrid systems demonstrate both continuous and discrete behaviors, making it computationally difficult to verify and control. The existing approaches greatly suffer from exponential explosion of the number of characterizable dynamics. We introduce polytopic trees, which is a sampling-based abstraction of hybrid systems. The central idea is to sample polytopic trajectories, which provide a set of parameterized polytope-to-polytope transitions. These parameters are computed using (mixed-integer) convex programs. In the higher level, these polytopic trajectories are grown backward from a user-defined goal region in a tree fashion, providing an infinite set of parametrized trajectories that reach the goal while satisfying all the system constraints. The remaining effort is making these polytopes cover the state-space as much as possible. These polytopic trajectories can be used to synthesize control policies, or to verify controllers, which themselves may be characterized by hybrid phenomena such as those encountered in deep neural networks that are based on rectifier linear units. We provide illustrative examples from contact-based dynamics in robot locomotion and manipulation.

Milad Siami
A Separation Principle for Joint Sensor and Actuator Scheduling with Guaranteed Performance Bounds

We study the problem of jointly designing a sparse sensor and actuator schedule for linear dynamical systems while guaranteeing a control/estimation performance that approximates the fully sensed/actuated setting. We further prove a separation principle, showing that the problem can be decomposed into finding sensor and actuator schedules separately. However, we show that this problem cannot be efficiently solved or approximated in polynomial, or even quasi-polynomial, time for time-invariant sensor/actuator schedules; instead, we develop a framework for a time-varying sensor/actuator schedule for a given large-scale linear system with guaranteed approximation bounds using deterministic polynomial-time, and randomized near-linear-time algorithms. Our main result is to provide a polynomial-time joint actuator and sensor schedule that on average selects only a constant number of sensors and actuators at each time step, irrespective of the dimension. The key idea is to sparsify the controllability and observability Gramians while providing approximation guarantees for Hankel singular values.

Joao Cavalcanti
Sign-stability of Positive Markov Jump Linear Systems

This work investigates stability of Positive Markov Jump Linear Systems (PMJLSs) in the absence of a numerical realization. It considers the situation when only the signs (and not the magnitudes) of the entries of the subsystem matrices and the Markov transition matrices are known. The result is an analysis of a qualitative notion of stability known as sign-stability. Although the notions of sign-stability of PMJLSs are natural extensions of standard stochastic stability concepts such as exponential mean stability and exponential mean-square stability, some of these notions are proven to be equivalent for sign-stability even in cases where the corresponding standard concepts are not. Moreover, for irreducible Markov chains, the particular structure of the Markov chain is shown to have no bearing on sign-stability.

Dongchan Lee
Input-Output Stability Analysis of Power Grids with Robustness Guarantee

The wide deployment of renewable generation and the gradual decrease in the overall system inertia make modern power grids more vulnerable to transient instabilities and unacceptable frequency fluctuations. Time-domain simulation-based assessment of the system robustness against uncertain and stochastic disturbances is extremely time-consuming. In this paper, we develop an alternative approach, which has its roots in the input-output stability analysis for Lur’e systems. Our approach consists of a mathematically rigorous characterization of the external disturbances that the power system is transiently stable and the frequency constraints are not violated. The derived certificate is efficiently constructed via convex optimization and is shown to be non-conservative for different IEEE test cases.