Student Talk Abstracts

Session 1A: Machine Learning, Information, and Statistics

Josh Gruenstein
Residual Model Learning for Micro-Robot Control

A majority of micro-robots are constructed using compliant materials that are difficult to model analytically, limiting the utility of traditional model-based controllers. The difficulty of data collection on micro-robots and large errors between simulated models and real robots make current model-based learning and sim-to-real transfer methods difficult to apply. We propose a novel framework ''residual model learning'' (RML) that leverages approximate simulation models to substantially reduce the sample complexity associated with learning an accurate robot model. We show that using RML, we can learn a model of the Harvard Ambulatory Micro-Robot (HAMR) using just 12 seconds of passively collected interaction data. The learned model is accurate enough to be leveraged as ''proxy-simulator'' for learning walking and turning behaviors using model-free reinforcement learning algorithms. RML provides a general framework for learning from extremely small amounts of interaction data, and our experiments with HAMR clearly demonstrate that RML substantially outperforms existing techniques.

Behrooz Tahmasebi
Recursive Neighborhood Pooling for Graph Representation Learning

While massage passing based Graph Neural Networks (GNNs) have become increasingly popular architectures for learning with graphs, recent works have revealed important shortcomings in their expressive power. In response, several higher-order GNNs have been proposed, which substantially increase the expressive power, but at a large computational cost. Motivated by this gap, we introduce and analyze a new recursive pooling technique of local neighborhoods that allows different tradeoffs of computational cost and expressive power. First, we show that this model can count subgraphs of size $k$, and thereby overcomes a known limitation of low-order GNNs. Second, we prove that, in several cases, the proposed algorithm can greatly reduce computational complexity compared to the existing higher-order k-GNN and Local Relational Pooling (LRP) networks. We also provide a (near) matching information-theoretic lower bound for graph representations that can provably count subgraphs, and discuss time complexity lower bounds as well.

Lei Xu
Attacking Text Classifiers via Sentence Rewriting Sampler

Most adversarial attack methods on text classification are designed to change the classifier's prediction by modifying few words or characters. Few try to attack classifiers by rewriting a whole sentence, due to the difficulties inherent in sentence-level rephrasing and the problem of setting the criteria for legitimate rewriting. To tackle this problem, we design a general sentence rewriting sampler (SRS) framework, which can conditionally generate meaningful sentences. Then we customize SRS to attack text classification models. Our method can effectively rewrite the original sentence in multiple ways while maintaining high semantic similarity and good sentence quality. Experimental results show that many of these rewritten sentences are misclassified by the classifier. Our method achieves a better attack success rate on 4 out of 6 datasets, as well as significantly better sentence quality on all 6 datasets.

Sitan Chen
Learning Deep ReLU Networks Is Fixed-Parameter Tractable

We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. Our bounds depend on the number of hidden units, depth, spectral norm of the weight matrices, and Lipschitz constant of the overall network (we show that some dependence on the Lipschitz constant is necessary). We also give a bound that is doubly exponential in the size of the network but is independent of spectral norm. These results provably cannot be obtained using gradient-based methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. In contrast, prior work for learning networks of depth three or higher requires exponential time in the ambient dimension, even when the above parameters are bounded by a constant. Additionally, all prior work for the depth-two case requires well-conditioned weights and/or positive coefficients to obtain efficient run-times. Our algorithm does not require these assumptions. Our main technical tool is a type of filtered PCA that can be used to iteratively recover an approximate basis for the subspace spanned by the hidden units in the first layer. Our analysis leverages new structural results on lattice polynomials from tropical geometry. Joint work with Adam Klivans and Raghu Meka.

Yi Tian
Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

Markov decision processes (MDPs) are a model for sequential decision making. In recent years people have gained a good understanding of the minimax optimal learning complexity (in terms of regret or sample complexity) in MDPs with finite state and action spaces. Many real-world problems not only can be modeled by MDPs, but also have a certain structure captured by the model of factored MDPs. Prior works have shown that by exploiting the factored structures effectively, the learning complexity can be reduced significantly. However, the minimax optimal learning complexity in factored MDPs is still open. In this talk we partially answer this question by presenting both near-optimal algorithms and information-theoretic lower bounds. A key new ingredient of our algorithms is the design of a bonus term to guide exploration.

Parker Lusk
CLIPPER: A Graph-Theoretic Framework for Robust Data Association

We present CLIPPER (Consistent LInking, Pruning, and Pairwise Error Rectification), a framework for robust data association in the presence of noise and outliers. We formulate the problem in a graph-theoretic framework using the notion of geometric consistency. State-of-the-art techniques that use this framework utilize either combinatorial optimization techniques that do not scale well to large-sized problems, or use heuristic approximations that yield low accuracy in high-noise, high-outlier regimes. In contrast, CLIPPER uses a relaxation of the combinatorial problem and returns solutions that are guaranteed to correspond to the optima of the original problem. Low time complexity is achieved with an efficient projected gradient ascent approach. Experiments indicate that CLIPPER maintains a consistently low runtime of 15 ms where exact methods can require up to 24 s at their peak, even on small-sized problems with 200 associations. When evaluated on noisy point cloud registration problems, CLIPPER achieves 100% precision and 98% recall in 90% outlier regimes while competing algorithms begin degrading by 70% outliers. In an instance of associating noisy points of the Stanford Bunny with 990 outlier associations and only 10 inlier associations, CLIPPER successfully returns 8 inlier associations with 100% precision in 138 ms.

Session 1B: Control, Autonomy, and Decision Making

Anubhav Guha
MRAC-RL: A Framework for On-Line Policy Adaptation Under Parametric Model Uncertainty

Reinforcement learning (RL) algorithms have been successfully used to develop control policies for dynamical systems. For many such systems these policies are trained in a simulated environment. ue to discrepancies between the simulated model and the true system dynamics, RL trained policies often fail to generalize and adapt appropriately when deployed in the real-world environment. Current research in bridging this ''sim-to-real'' gap has largely focused on improvements in simulation design and on the development of improved and specialized RL algorithms for robust control policy generation. In this work we apply principles from adaptive control and system identification to develop the model-reference adaptive control & reinforcement learning (MRAC-RL) framework. We introduce a set of novel MRAC algorithms applicable to a broad range of linear and nonlinear systems, and derive the associated control laws. The MRAC-RL framework utilizes an inner-loop adaptive controller that allows a simulation-trained outer-loop policy to adapt and operate effectively in a test environment, even when parametric model uncertainty exists. We demonstrate that the MRAC-RL approach improves upon state-of-the-art RL algorithms in developing control policies that can be applied to systems with modeling errors. Theoretical and experimental results are provided, which verify and validate this approach.

Juncal Arbelaiz
Can Noise Induce Sparsity?

Traditional optimal control and estimation techniques yield centralized communication architectures. For large-scale spatially-distributed systems, such all-to-all communication topologies are often prohibitive, which justifies the use of decentralized controllers and estimators that trade-off between performance and complexity. Motivated by this challenge, in this work we analyze distributed Kalman filters for spatially-invariant plants and ask whether noise could help decentralize the filter. In the spatially-invariant setting, Kalman filters possess an inherent degree of spatial localization, and we show it is precisely the noise present in the system which defines to which extent the filter is spatially-localized. We then analyze the optimal decentralized Kalman filter design - previously thought to be a nonconvex problem - and demonstrate that it can be formulated as a convex program in the spatial frequency domain. We conclude by establishing that the performance drop between optimal centralized and decentralized filters becomes arbitrarily small in certain noise regimes. For the sake of illustration, we present our results through the study of a diffusion PDE in an unbounded domain.

Yingnan Cui
Fast Convergence of Performance Error and Parameter Error with Excitation

A parameter estimation algorithm for adaptive control will be presented. The algorithm adopts a matrix of time-varying learning rates, which results in fast convergence of performance error and parameter error under excitation. Both vector and matrix regressors are considered in the adaptive law, and the corresponding convergence rates will be shown. A projection operator is used to ensure the boundedness of the learning rate matrix when there is no guarantee of excitation. Numerical simulation results will be provided to verify the theoretical analysis.

Anurag Ajay
Offline Primitive Discovery for Accelerating Offline Reinforcement Learning

Reinforcement learning (RL) has achieved impressive performance in a variety of online settings in which an agent's ability to query the environment for transitions and rewards is effectively unlimited. However, in many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited. In this work, we focus on this offline setting. Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors before using these primitives for downstream task learning. Primitives extracted in this way serve two purposes: they delineate the behaviors that are supported by the data from those that are not, making them useful for avoiding distributional shift in offline RL; and they provide a degree of temporal abstraction, which reduces the effective horizon yielding better learning in theory, and improved offline RL in practice. In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning as well as exploration and transfer in online RL on a variety of benchmark domains. Visualizations are available at https://sites.google.com/view/opal-iclr.

Hanzhang Qin
Provably Sample-Efficient Inventory Control

What is the minimum sufficient number of samples that an algorithm needs to generate a near-optimal policy for the multistage stochastic inventory control problem? The answer remains open since sampling-based multistage stochastic inventory control was first proposed and studied by Levi et al. (2007). In this paper, we resolve the long-standing open question by proposing a sampling-based algorithm that achieves the optimal sample complexity which is matched by a provided lower bound, up to a logarithmic factor. Our approach is inspired by recent developments of finite sample analysis in the field of reinforcement learning, and our result not only holds for uncensored demands, but can also generalize for censored demands to some extent.

Ruihao Zhu
Non-Stationary Online Decision-Making

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary online reinforcement learning settings. We begin with the non-stationary bandit problem, where we design and analyze the sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound when we know the respective underlying non-stationarity. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, we can further enjoy the (nearly) optimal dynamic regret bounds in a (surprisingly) parameter-free manner. We then discuss the implications on reinforcement learning for non-stationary MDPs, including the challenges in estimating state transition distribution and the solution with more optimism.

Bharadwaj Satchidanandan
An Efficient and Incentive-Compatible Mechanism for Energy Storage Markets

A key obstacle to increasing renewable energy penetration in the power grid is the lack of utility-scale storage capacity. Transportation electrification has the potential to overcome this obstacle since Electric Vehicles (EVs) that are not in transit can provide battery storage as a service to the grid. This is referred to as EV-Power grid integration, and could potentially be a key milestone in the pathway to decarbonize the electricity and the transportation sectors. We first show that if EV-Power grid integration is not done carefully, then contrary to improving the cost efficiency of operating the grid, it could in fact be counterproductive to it. This fundamentally occurs due to two phenomena operating in tandem -- the randomness of EV usage patterns and the possibility of strategic behavior by EV operators. We present a market-based solution to address this issue. Specifically, we develop a mechanism for energy storage markets using which the system operator can efficiently integrate a fleet of strategic EVs with random usage patterns into the grid, utilize them for storage, and satisfy the demand at the minimum possible cost. The talk is based on joint work with Prof. Munther Dahleh.

Session 2A: Networks, Systems, Games and Economics

Micah Smith
A New Approach to Collaborative Data Science with the Ballet Framework

While the open-source model for software development has led to successful large-scale collaborations in building software systems, data science projects are frequently developed by individuals or small groups. We describe challenges to scaling data science collaborations and present a novel ML programming model to address them. We instantiate these ideas in Ballet, a lightweight software framework for collaborative open-source data science and a cloud-based development environment, with a plugin for collaborative feature engineering. Using our framework, collaborators incrementally propose feature definitions to a repository which are each subjected to an ML evaluation and can be automatically merged into an executable feature engineering pipeline. We leverage Ballet to conduct an extensive case study analysis of a real-world income prediction problem, and discuss implications for collaborative projects.​

Chin-Chia Hsu
Verification, Disclosure and Network Formation in News Subscription Services

We study the formation of a subscription network where a continuum of strategic, Bayesian subscribers decide to subscribe to one of two sources (leaders) for news that can be either informative or uninformative about an underlying state of the world. The bipartisan leaders, representing polarized and opposing perspectives and aiming to maximize the welfare of all subscribers, have a motive to persuade the subscribers to take the optimal binary action against the state according to their own perspectives. Each leader decides whether to disclose the news to her own subscribers when there is news. Subscribers will update their beliefs not only when receiving the news but when no news is disclosed, speculating that there may be news that was concealed due to the leader's strategic disclosure decision. When verifying the informativeness of the news is not available to both leaders and subscribers, we show that at any equilibrium anti-homophily emerges among the centrist subscribers whereas subscribers with extreme perspectives feel indifferent between the two bipartisan leaders. In contrast, when only the leaders are capable of verifying the informativeness, homophily arises in the extreme subscribers. Our results shed light on how individuals would seek information when information is private or costly to obtain, while considering the strategic verification and disclosure by the news providers who are partisan and have a hidden motive to persuade their followers.

Max Li
Top-Down Control of Airport Network Delays

The network-level redistribution of airport delays reflects an aggregation of microscopic, tactical actions in response to airport capacity constraints. We investigate designing delay redistribution control policies under delay-conserving constraints, reflecting the fact that incurred delays cannot be removed in the absence of mechanisms such as flight cancellations. We demonstrate our framework on historical hourly sequences of US air transportation network disruptions, comparing the optimal selective redistribution policies against actual operations. We also provide a ranking of least- to most-costly delay-absorbing airports. Our control policies could be implemented as constraints in a standard air traffic flow management problem (TFMP), encouraging solutions to the TFMP that conform to delay redistribution requirements.

Muhammed Sayin
Fictitious Play in Zero-sum Stochastic Games

We present fictitious play dynamics for the general class of stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves agents forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playing a myopic best response using estimated continuation payoffs. Agents update their beliefs at states visited from observations of opponent actions. A key property of the learning dynamics is that the update of the beliefs on Q-functions occurs at a slower timescale than the update of the beliefs on strategies. We show both in the model-based and model-free cases (without knowledge of agent payoff functions and state transition probabilities), the beliefs on strategies converge to a stationary mixed Nash equilibrium of the zero-sum stochastic game.

Anish Agarwal
Synthetic Interventions

We develop a method to help quantify the impact different levels of mobility restrictions could have had on COVID-19 related deaths across nations. Synthetic control (SC) has emerged as a standard tool in such scenarios to produce counterfactual estimates if a particular intervention had not occurred, using just observational data. However, it remains an important open problem of how to extend SC to obtain counterfactual estimates if a particular intervention had occurred - this is exactly the question of the impact of mobility restrictions stated above. As our main contribution, we introduce synthetic interventions (SI), which helps resolve this open problem by allowing one to produce counterfactual estimates if there are multiple interventions of interest. In addition to the COVID-19 case study, we show how SI can be used to run data-efficient, personalized randomized control trials using real data from a large e-commerce website and a large developmental economics study.

Chandler Squires
Causal Imputation via Synthetic Interventions

Consider the problem of determining the effect of a drug on a specific cell type. To answer this question, researchers traditionally need to run an experiment applying the drug of interest to that cell type. This approach is not scalable: given a large number of different actions (drugs) and a large number of different contexts (cell types), it is infeasible to run an experiment for every action-context pair. In such cases, one would ideally like to predict the result for every pair while only having to perform experiments on a small subset of pairs. This task, which we label causal imputation, is a generalization of the causal transportability problem. In this paper, we provide two main contributions. First, we demonstrate the efficacy of the recently introduced synthetic interventions estimator on the task of causal imputation when applied to the prominent CMAP dataset. Second, we explain the demonstrated success of this estimator by introducing a generic linear structural causal model which accounts for the interaction between cell type and drug.

Session 2B: Optimization, Algorithms and Theory

Chenyang Yuan
Semidefinite Relaxations of Product of PSD Forms

We study the problem of maximizing the geometric mean of d low-degree non-negative forms on the real or complex sphere in n variables. We show that this highly non-convex problem is NP-hard even when the forms are quadratic and is equivalent to optimizing a homogeneous polynomial of degree O(d) on the sphere. Previous Sum-of-Squares based convex relaxations require solving semidefinite programs (SDPs) of size n^O(d), with multiplicative approximation factors of 1/n. We exploit the compact representation of this polynomial to introduce a SDP relaxation of size polynomial in n and d, and prove that it achieves a constant factor multiplicative approximation when maximizing the geometric mean of non-negative quadratic forms. We also show that this analysis is asymptotically tight, with a sequence of instances where the gap between the relaxation and true optimum approaches this constant factor as d -> infinity. Next we propose a series of intermediate relaxations of increasing complexity that interpolate to the full Sum-of-Squares relaxation, as well as a rounding algorithm that finds an approximate solution from the solution of any intermediate relaxation. Finally we show that this approach can be generalized to find relaxations of products of non-negative forms of any degree, as well as sparse sums of such products.​

Tobia Marcucci
Shortest Paths in Graphs of Convex Sets

Given a graph, the shortest-path problem requires finding a sequence of edges with minimum cumulative length that connects a source to a target vertex. We consider a generalization of this classical problem in which the position of each vertex in the graph is a continuous decision variable, constrained to lie in a corresponding convex set. The length of an edge is then defined as a convex function of the positions of the vertices it connects. Problems of this form arise naturally in road networks, robot navigation, and even optimal control of hybrid dynamical systems. The price for such a wide applicability is the complexity of this problem, which is easily seen to be NP-hard. Our main contribution is a strong mixed-integer convex formulation based on perspective functions. This formulation has a very tight convex relaxation and allows to efficiently find globally-optimal paths in large graphs and in high-dimensional spaces. Joint work with Jack Umenberger, Pablo Parrilo, and Russ Tedrake.

Shuvomoy Das Gupta
Exterior-point Operator Splitting for Nonconvex Learning

We present nonconvex exterior-point operator splitting algorithm (NExOS), a novel linearly convergent first-order algorithm tailored for constrained nonconvex learning problems. We consider the problem of minimizing a convex cost function over a nonconvex constraint set, where projection onto the constraint set is single-valued around points of interest. A wide range of nonconvex learning problems have this structure including (but not limited to) sparse and low-rank optimization problems. By exploiting the underlying geometry of the constraint set, NExOS finds a locally optimal point by solving a sequence of penalized problems with strictly decreasing penalty parameters. NExOS solves each penalized problem by applying an outer iteration operator splitting algorithm, which converges linearly to a local minimum of the corresponding penalized formulation under regularity conditions. Furthermore, the local minima of the penalized problems converge to a local minimum of the original problem as the penalty parameter goes to zero. We implement NExOS in the open-source Julia package NExOS.jl, which has been extensively tested on many instances from a wide variety of learning problems. We study several examples of well-known nonconvex learning problems, and we show that in spite of being general purpose, NExOS is able to compute high quality solutions very quickly and is competitive with specialized algorithms.

Yunzong Xu
On Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on ''easy'' problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While positive results are known for certain special cases, there is no general theory characterizing when and how instance-dependent regret bounds for contextual bandits can be achieved for rich, general classes of policies. We introduce a family of complexity measures that are both sufficient and necessary to obtain instance-dependent regret bounds. We then introduce new oracle-efficient algorithms which adapt to the gap whenever possible, while also attaining the minimax rate in the worst case. We provide structural results that tie together a number of complexity measures previously proposed throughout contextual bandits, reinforcement learning, and active learning and elucidate their role in determining the optimal instance-dependent regret. Turning our focus to reinforcement learning with function approximation, we develop new oracle-efficient algorithms for reinforcement learning with rich observations that obtain optimal gap-dependent sample complexity. Joint work with Dylan Foster, Sasha Rakhlin, and David Simchi-Levi.

Eren C. Kizildag
The Landscape of the Number Partitioning Problem and the Overlap Gap Property

We consider the problem of finding near optimal solutions to the number partitioning problem; a problem that is of utmost importance in the study of randomized controlled experiments, statistics, theoretical computer science, as well as in statistical mechanics. This problem exhibits what is known as a ''statistical-computational gap'': there is a significant gap between what the information-theoretical (i.e., the existential) methods guarantee vs. what is achievable by means of efficient (i.e. polynomial-time) algorithms. We establish that the landscape of this problem exhibits a certain intricate geometric property, dubbed as the Overlap Gap Property; which essentially states that any two ''near ground-state'' configurations are ''either too close or too far''. As a consequence, we establish that stable algorithms (appropriately defined) fail to find near ground-state solutions with respect to an appropriate ''energy level''. At a technical level, our proof uses ideas from extremal graph theory and Ramsey theory. Our result applies to algorithm so long as they are stable, that is, with no assumptions on their runtimes. As a complementary direction; we also study the landscape of the same problem in the regime above the aforementioned energy level. We establish that for this regime, the overlap gap property is, perhaps surprisingly, absent: this is achieved by using the second moment method and the probabilistic method. Finally, we compute the expected number of the local optima of this model; and show it has exponentially many local optima. As a consequence, greedy algorithms fail to find near ground-state solutions. Joint work with my advisor, Prof. David Gamarnik.

Jason Altschuler
Settling the Computational Complexity of Wasserstein Barycenters

Computing Wasserstein barycenters is a fundamental geometric problem with widespread applications in machine learning, statistics, and computer graphics. However, despite considerable recent attention, it is an open problem whether Wasserstein barycenters can be computed in polynomial time. In this talk, I’ll describe two recent results which resolve this question: 1) Wasserstein barycenters are NP-hard to compute (even approximately). 2) However, in any fixed dimension, there are polynomial-time algorithms. Based on joint work with Enric Boix-Adsera.