LIDS Student Conference 2017

Student Session on Networks

Arsalan Sharifnassab Bounded perturbations in hybrid dynamical systems with applications to network scheduling

We begin by a math question and show in a class dynamical systems that a bounded exogenous perturbation can only cause a bounded deviation in the solutions. Moreover, this deviation bound is proportional to the incurred perturbation. Such results are of utmost importance in the analysis of perturbed control systems, as one can approximate the dynamics by the dynamics of an unperturbed system. Then, use the deviation bounds to get an approximate solution for the perturbed system.

In the second part of the talk, we will discuss two applications in network scheduling. First, we present a result on exponential state space collapse (with e^{O(r)} scaling), which is a considerable improvement over the existing diffusion (r^2) scaling collapse results. Second, we give a sufficient condition for delay stability in networks with heavy tailed traffics.

Martin Zubeldia Low-Delay medium access for large-scale IoT networks

Scalable low-delay medium access is of critical importance in enabling the Internet-of-Things (IoT), providing agile connectivity to unprecedented numbers of sensors, machine-type devices and control gadgets. While the number of IoT nodes will be massive, the activity of individual IoT flows is expected to be highly intermittent, with low average traffic rates, but possibly stringent latency constraints.

Motivated by these challenges, we examine a canonical scenario where several sources generate sporadic delay-sensitive messages that need to be transmitted to a common access point. The access point can instruct the various sources with what probabilities to transmit a message, if they have any. When several sources transmit simultaneously, the access point can detect a collision, but is unable infer the identities of the sources involved. While the access point can use the channel activity observations to obtain estimates of the queue states at the various sources, it does not have any explicit queue length information otherwise.

We explore the achievable delay performance in a regime where the number of sources (n) grows large while the total traffic load remains fixed. This scaling captures that the key challenge in IoT scenarios is not so much to handle a high aggregate traffic load, but rather to maintain low delay when traffic is dispersed across immense numbers of highly intermittent sources. We prove that bounded delay can be achieved in stationary conditions when the total load is below some threshold A>0, even as the number of sources increases without bound. In addition, we establish that the delay must grow at least linearly with n when the total load is above some threshold B<1. This demonstrates that bounded delay can only be achieved if a positive fraction of the system capacity is sacrificed. Various simulation experiments are conducted to illustrate the asymptotic scaling results and associated performance tradeoffs.

Joint work with Sem Borst (Nokia Bell Labs).

Igor Kadota Minimizing the age of information in wireless networks

We consider a wireless broadcast network with a base station sending time-sensitive information to a number of clients. The Age of Information (AoI), namely the amount of time that elapsed since the most recently delivered packet was generated, captures the freshness of the information. We formulate a discrete-time decision problem to find a scheduling policy that minimizes the expected weighted sum AoI of the clients in the network. To the best of our knowledge, this is the first work to provide a scheduling policy that optimizes AoI in a wireless network with unreliable channels.
The results are twofold: first, we show that a Greedy Policy, which transmits the packet with highest current age, is optimal for the case of symmetric networks. Then, for the general network case, we establish that the problem is indexable and obtain the Whittle Index in closed-form. Numerical results are presented to demonstrate the performance of the policies

Abhishek Sinha Throughput optimal algorithms for generalized flow problems

We consider the problem of throughput-optimal packet dissemination, in the presence of an arbitrary mix of unicast, broadcast, multicast and anycast traffic, in a general wireless network. We propose an online dynamic policy, called Universal Max-Weight (UMW), which solves the problem efficiently. To the best of our knowledge, UMW is the first throughput-optimal algorithm of such versatility in the context of generalized network flow problems. Conceptually, the UMW policy is derived by relaxing the precedence constraints associated with multi-hop routing, and then solving a min-cost routing and max-weight scheduling problem on a virtual network of queues. When specialized to the unicast setting, the UMW policy yields a throughput-optimal cycle-free routing and link scheduling policy. This is in contrast to the well-known throughput-optimal Back-Pressure (BP) policy which allows for packet cycling, resulting in excessive latency. Extensive simulation results show that the proposed UMW policy incurs a substantially smaller delay as compared to the BP policy. The proof of throughput-optimality of the UMW policy combines ideas from stochastic Lyapunov theory with a sample path argument from adversarial queueing theory and may be of independent theoretical interest.

Student Session on Inference/Statistics I

Amelia Perry Message passing algorithms for synchronization problems

Synchronization is the learning task of aligning rotated objects based on noisy observations of their pairwise relative rotations, or more generally given relative 'alignments' drawn from some symmetry group. A central application is the estimation of molecule rotations in cryo-electron microscopy. We present a new algorithm following the framework of approximate message passing, which statistical physics suggests may yield the optimal efficient reconstruction. Our approach leverages the representation theory of compact groups to give a unified, general approach.

Joint work with Alex Wein, Afonso Bandeira, and Ankur Moitra.

Alex Wein Optimality and sub-optimality of PCA for spiked random matrix models

Various data-processing tasks can be described by spiked matrix models where a random noise matrix (Wigner or Wishart) is perturbed by a planted rank-1 signal. Classical results from random matrix theory state that PCA (top eigenvalue/eigenvector) succeeds at detecting and estimating the signal if and only if the signal-to-noise ratio exceeds a particular threshold. We consider the question of whether PCA is optimal, or whether there is some other statistical test that can succeed below the PCA threshold. We answer these types of questions using a simple and widely-applicable second moment method associated with the statistical notion of "contiguity." We allow various priors on the planted signal, capturing structured problems such as sparse PCA.

Joint work with Amelia Perry, Afonso Bandeira, and Ankur Moitra.

Amin Rahimian Bayesian group decisions

Many important real-world applications involve group interactions among individuals with purely informational externalities, such situations arise for example in jury deliberations, expert committees, medical diagnosis and so on. We model the purely informational interactions of rational agents in a group, where they receive private information and act based upon that information while also observing other people's recommendations. In recent works, recursive techniques have been applied to analyze Bayesian decision problems with relative success. In this note, we will use the framework of iterative eliminations to model the decision problem as well as the thinking process of a Bayesian agent in a group decision scenario. As the Bayesian agent attempts to infer the true state of the world from her sequence of observations about actions of others as well as her own private signal, she recursively refines her belief on the signals that other players could have observed and actions that they could have taken given the assumption that other players are also rational. We study the asymptotic properties of Bayesian group decisions, using the Markov Bayesian equilibrium as a solution concept. We further analyze the computational complexity of the Bayesian group decision problem and provide algorithms for Bayesian decision making in general networks. We next show how these computations simplify in networks with a causal order of observations. We finally give insights about the rate of leaning and efficiency of the group decision outcomes.

Jonathan Weed Optimal rate of estimation for the multi-reference alignment problem

This paper describes optimal rates of adaptive estimation of a vectors in the multi-reference alignment model, a problem with important applications in signal processing, image processing, and computer vision, among others. We describe how this model can be viewed as a multivariate Gaussian mixture model under the constraint that the centers belong to the orbit of a finite group. This enables us to derive matching upper and lower bounds that feature an interesting dependence on the signal-to-noise ratio of the model. Both upper and lower bounds are articulated around a tight local control of Kullback-Leibler divergences that showcases the central role of moment tensors in this problem.

Joint work with Afonso Bandeira and Philippe Rigollet.

Student Session on Probability/Statistics II

Christina Lee Unifying Framework for Crowd-sourcing via Graphon Estimation

We consider the question of inferring true answers associated with tasks based on potentially noisy answers obtained through a micro-task crowd-sourcing platform such as Amazon Mechanical Turk. We propose a generic, non-parametric model for this setting: for a given task i, 1\leq i \leq T, the response of worker j, 1\leq j\leq W for this task is correct with probability F_{ij}, where matrix F = [F_{ij}]_{i\leq T, j\leq W} may satisfy one of a collection of regularity conditions including low rank, which can express the popular Dawid-Skene model; piecewise constant, which occurs when there is finitely many worker and task types; monotonic under permutation, when there is some ordering of worker skills and task difficulties; or Lipschitz with respect to an associated latent non-parametric function. This model, contains most, if not all, of the previously proposed models to the best of our knowledge.

We show that the question of estimating the true answers to tasks can be reduced to solving the Graphon estimation problem, for which there has been much recent progress. By leveraging these techniques, for a large class of regularity conditions under which there exists performance bounds for Graphon estimation, we can equivalently provide bounds on the fraction of incorrectly estimated tasks of the resulting crowdsourcing algorithm. Subsequently, we have a solution for inferring the true answers for tasks using noisy answers collected from crowd-sourcing platform under a significantly larger class of models. Concretely, we establish that if the (i,j)-th element of F, F_{ij}, is equal to a Lipschitz continuous function over latent features associated with the task i and worker j for all i, j, then all task answers can be inferred correctly with high probability by soliciting O(ln(T)^{3/2}) responses per task even without any knowledge of the Lipschitz function, task and worker features, or the matrix F.

Ilias Zadik Sparse high dimensional regression: maximum likelihood and phase transitions

In this talk we concentrate on the classical problem of sparse high dimensional regression $Y=X\beta^{*}+W$ where $X$ is $n\times p$ matrix with i.i.d. standard normal entries, $W$ is $n\times 1$ vector with i.i.d. $N(0,\sigma^{2})$ entries and $\beta^{*}$ is $p\times 1$ binary vector with $k$ entries equal to unity ($k$-sparse) under the assumptions that $\log k=o(\log p)$ and $\sigma^2=o(k)$. The goal is recovering with high probability (w.h.p) the vector $\beta^{*}$ from observed $X$ and $Y$.

In the literature some remarkable papers by Wainwright, Donoho, Candes and Tao show that LASSO ($l_1$-constrained quadratic programming) and Compressed Sensing techniques can recover $\beta^*$ as long as $n \geq 2k \log p$.

In this talk, we will discuss some new results for the performance of the maximum likelihood estimator (M.L.E.) of the problem. We show that the performance of the maximum likelihood estimator obtains a sharp phase transition around $n^*=\frac{2k}{ \log (1+\frac{2k}{\sigma^{2}})}\log p$, that is if for some $\epsilon>0$, $n>(1+\epsilon)n^*$ then the M.L.E. w.h.p. is recovering almost all the support of $\beta^*$, but if $n<(1-\epsilon)n^*$ w.h.p. it fails to recover any positive constant fraction of the true support. Notice that as $n^* \ll 2k \log p$, the above results imply that the M.L.E., in the possible cost of efficiency, can recover $\beta^*$ with much fewer samples than the known algorithms.

The above analysis reveals a certain Overlap Gap Property (O.G.P.) on the space of all binary vectors $\beta$ when $n\le ck\log p$ for sufficiently small constant $c>0$. By drawing a connection with a similar O.G.P. exhibited by many random graphs and statistical mechanics models, we conjecture that O.G.P. is the source of algorithmic hardness of solving the regression problem in the regime $n \ll 2k \log p$.

Joint work with David Gamarnik.

Yuhao Wang A new framework for joint estimation of multiple high-dimensional directed acyclic graphs

Learning graphical structure of Bayesian networks or directed acyclic graphs using observational data has been a very challenging problem. Conventional methods for learning graphical structure of a specific Bayesian network usually only uses observational data from this network. However, in many real world applications, such as learning gene regulatory networks of different tissues or cell types, we could improve the estimation performance by incorporating data from multiple Bayesian networks sharing similar graphical structures. In this talk, we present a new framework, jointGES, for causal inference using data from multiple Bayesian networks. Compared to separate estimation, this framework could accelerate convergence rate to true causal structure as number of observations goes to infinity. We also used simulation study to validate our theoretical results.

Joint work with Caroline Uhler.

Anuran Makur Comparison of Channels: Criteria for Domination by a Symmetric Channel

Consider the family of all q-ary symmetric channels with capacities decreasing from log(q) to 0. In this talk, we study the following basic question: what is the member of this family with the smallest capacity that dominates a given channel V in the "less noisy" sense? This question turns out to be a natural extension of the concept of "strong data processing inequalities." We develop a simple criterion for domination by a symmetric channel in terms of the minimum entry of the stochastic matrix defining the channel V. The criterion is then strengthened for the special case of additive noise channels over finite Abelian groups. Finally, it is shown that domination by a symmetric channel implies (via comparison of Dirichlet forms) a logarithmic Sobolev inequality for the original channel.

Joint work with Yury Polyanskiy.

Student Session on Optimization

Nikita Korolko The K-server problem via modern optimization lens

We reconsider the well-known K-server problem from the perspective of mixed-integer, robust and adaptive optimization. Existing online methods typically exploit only information from the past in order to make the next decision. We propose a new tractable mixed-integer linear formulation of the K-server problem that includes both information from the past and available assumptions about the future.
Combining ideas behind an existing online method called the Work Function Algorithm and adjustable robust optimization, we design a new method that inherits positive properties of both approaches. It is computationally tractable, gradually outperforms pure online algorithms and classical adaptive optimization methods when increasing information about the future becomes available, and is shown to be stable with respect to potential errors in the assumptions about that future.

Denizcan Vanli Global convergence rate of proximal incremental aggregated gradient methods

We focus on the problem of minimizing the sum of smooth component functions (where the sum is strongly convex) and a non-smooth convex function, which arises in regularized empirical risk minimization in machine learning and distributed constrained optimization in wireless sensor networks and smart grids. We consider solving this problem using the proximal incremental aggregated gradient (PIAG) method, which at each iteration moves along an aggregated gradient (formed by incrementally updating gradients of component functions according to a deterministic order) and taking a proximal step with respect to the non-smooth function. While the convergence properties of this method with randomized orders (in updating gradients of component functions) have been investigated, this paper, to the best of our knowledge, is the first study that establishes the convergence rate properties of the PIAG method for any deterministic order. In particular, we show that the PIAG algorithm is globally convergent with a linear rate provided that the step size is sufficiently small. We explicitly identify the rate of convergence and the corresponding step size to achieve this convergence rate. Our results improve upon the best known condition number and aggregation history dependence of the convergence rate of the incremental aggregated gradient methods used for minimizing a sum of smooth functions.

Matthew Staib Robust budget allocation via continuous submodular functions

The optimal allocation of resources for maximizing the influence, the spread of information, or coverage, has gained attention in the past years, in particular in machine learning and data mining. However, in applications, the exact parameters of the problem are rarely known. We hence revisit a continuous version of the Budget Allocation or Bipartite Influence Maximization problem introduced by Alon et al. from a robust optimization perspective, where an adversary may choose the least favorable parameters within a confidence set. The resulting problem is a nonconvex-concave saddle point problem. We show that this nonconvex problem can be solved by leveraging connections to continuous submodular optimization, and by solving a constrained submodular minimization problem. Although constrained submodular minimization is hard in general, here, we establish conditions under which a continuous submodular optimization problem can be solved to arbitrary (additive) precision epsilon.

Arthur Flajolet Logarithmic regret bounds for multi-armed bandits with knapsack constraints

Multi-Armed Bandit (MAB) is a benchmark model for repeated decision making in stochastic environments with very limited, but immediate, feedback on the outcomes of alternatives. The Bandits with Knapsacks (BwK) model, a recent extension to the framework allowing to capture resource consumption, offers promising applications in many areas and particularly in dynamic pricing and real-time bidding, where advertisers arguably face a problem of this nature when bidding on advertising inventory.
In this talk, after motivating this model in the context of display and search auctions, we will present recent results we obtained on the characterization of optimal regret bounds for this model extension. More specifically, while asymptotically optimal regret bounds with respect to the time horizon T are now well documented for the original MAB problem, with ln{T} (resp. sqrt{T}) distribution-dependent (resp. distribution-free) bounds, the BwK model lacks this clear-cut distinction. We partially bridge the gap by designing algorithms achieving logarithmic growth rates under a non-degeneracy assumption.

Joint work with Patrick Jaiilet, MIT EECS

Diego Cifuentes Sampling varieties for Sum-of-Squares optimization

We study sum of squares (SOS) relaxations to a polynomial optimization problem over an algebraic variety V. We propose a new methodology that, rather than relying on some algebraic description, represents V with a generic set of samples. This approach depends only on the geometry of V, avoiding representation issues such as multiplicity and choice of generators. It also takes advantage of the coordinate ring structure to reduce the size of the corresponding semidefinite program (SDP). Our methods are particularly appealing for varieties which are easy to sample from but for which the defining equations are complicated, such as SO(n), Grassmannians or rank k tensors. For arbitrary varieties we can obtain the samples with the tools of numerical algebraic geometry.

Student Session on Applications

Bennett Cyphers AnonML

We propose AnonML, a system for privacy-preserving classifier generation over a network of anonymous peers. Our goal is to allow a group of users to combine enough data to generate useful machine learning models without revealing private information. All data-holders in the system are treated equally, and anyone can generate a model to meet their specific needs at any time. AnonML functions by moving as much computation as possible to its end users, away from a central authority. AnonML users compute features on their own data, thereby limiting the amount of information they need to share. To generate a model, a group of data-holding peers first agree on a model definition, a set of feature functions, and an aggregator, a peer who temporarily acts as a central authority. Each peer anonymously sends several small packets of labeled feature data to the aggregator. In exchange, the aggregator generates a classifier and shares it with the group. In this way, AnonML data holders control exactly what information they share on a feature-by-feature and model-by-model basis, and peers are able to disassociate features from their digital identities. We show that for many machine learning problems, AnonML is able to construct classifiers that are nearly as performant as traditional, centrally-generated models.

Zhi Xu Secret goals, private desires

In modern era, it is often the case that achieving one's goals involve interacting with other external entities. In the meantime, privacy is also desired. For example, you want to consult a database for data relevant to the optimization of your private objectives, but you do not want the "big brother" to find out about your objectives. We discuss formulations and some results on problems of this type. (Joint work with John Tsitsiklis and Kuang Xu).

Shreya Saxena Fundamental Limitations in the Generation of High Bandwidth Movements

The generation of fast muscle movements by the sensorimotor system is fundamentally limited by the number of neurons devoted to the task, their ability to transfer information about the intended movements, and the dynamics of muscles involved. Yet, the tradeoffs between the multiple factors have not been quantified rigorously. We identify limitations in the ability of the sensorimotor control system to track intended fast movements. With a closed-loop model in which an integrate-and-fire neuron actuates a muscle under visual and proprioceptive feedback, we identify conditions under which undesirable subharmonic oscillations may occur in the muscle output, based on the bandwidth of the movement. Our theoretical work based on feedback control principles identifies how number of neurons, neural dynamics, muscles and feedback may interact during the generation of fast movements.