# 2019 IEEE North American School of Information Theory Poster Abstracts

Application of Cybernetic Control Variables in the Modeling of Lipid Metabolism in Mammalian Systems
Lina Aboulmouna and Rubesh Raja (Purdue University)

Abstract: Cybernetic theory builds on the perspective that metabolic regulation is organized towards achieving system goals. While cybernetic models have previously been established by prior work done in bacterial systems, we show that cybernetic modeling can be applied to complex biological systems with a predefined goal. Our cybernetic model of prostaglandin (PG) metabolism in mouse bone marrow derived macrophage (BMDM) cells captures the complex regulation of PG metabolism and provides a reliable description of PG formation. While cybernetic approaches rely on optimizing regulatory mechanisms for a given goal, one of the limitations is in having to define the goal a priori. Developing a method which does not simply take any a priori objective function, but rather, optimize for the best objective function for the system will provide insight into the metabolic goals of multicellular systems allows for a more complete understanding of how the states of these cells evolve over time.

Some properties of Renyi Mutual Informations
Gautam Aishwarya (University of Delaware)

Abstract: Classical Shannon quantities in information theory such as entropy, divergence, conditional entropy and mutual information have been generalised to the correponding Renyi quantities. While there is a general consensus on the definitions of Renyi entropy and divergence, there are several candidates for conditional Renyi entropy and α-mutual information. We formulate Arimoto's conditional Renyi entropy for abstract alphabet with a reference measure. This quantity is then used to compare several Renyi notions of mutual information. We observe that despite the lack of symmetry in some of these expressions, they behave in a symmetric manner for the purpose of capacity calculations.

Mutual Information as a Function of Moments
Wael Alghamdi (Harvard University)

Abstract: We introduce a mutual information estimator based on the connection between estimation theory and information theory. By combining a polynomial approximation to the minimum mean-squared error estimator with the I-MMSE relationship, we derive a new formula for the mutual information $$I(X;Y)$$ that is a function of only the marginal distribution of $$X$$, the moments of $$Y$$, and the conditional moments of $$Y$$ given $$X$$. The proposed estimator captures desirable properties that the mutual information satisfies, such as being invariant under affine transformations.

Distributed and Secure Coded Matrix Computation with Flexible Communication Load
Malihe Aliasgari (New Jersey Institute of Technology)

Abstract: In this work, we impose an additional security constraint on the data matrices and assume that workers can collude to eavesdrop on the content of these data matrices. Specifically, we introduce a novel class of secure codes, referred to as secure generalized PolyDot codes, that generalizes previously published non-secure versions of these codes for matrix multiplication. These codes extend the state-of-the-art by allowing a flexible trade-off between recovery threshold and communication load for a fixed maximum number of colluding workers.

Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets

Abstract: We derive generalization and excess risk bounds for neural nets using a family of complexity measures based on a multilevel relative entropy. The bounds are obtained by introducing the notion of generated hierarchical coverings of neural nets and by using the technique of chaining mutual information introduced in Asadi et al. NeurIPS’18. The resulting bounds are algorithm-dependent and exploit the multilevel structure of neural nets. This, in turn, leads to an empirical risk minimization problem with a multilevel entropic regularization. The minimization problem is resolved by introducing a multi-scale generalization of the celebrated Gibbs posterior distribution, proving that the derived distribution achieves the unique minimum. This leads to a new training procedure for neural nets with performance guarantees, which exploits the chain rule of relative entropy rather than the chain rule of derivatives (as in backpropagation). To obtain an efficient implementation of the latter, we further develop a multilevel Metropolis algorithm simulating the multi-scale Gibbs distribution, with an experiment for a two-layer neural net on the MNIST data set.

Accelerating Computation Problems using Multi-Armed Bandits
Tavor Baharav (Stanford University)

Abstract: Many modern optimization problems involve heavy computation on large datasets. However, in most of these problems, average case instances are much easier than the worst case problem instances current algorithms are designed for. By turning these computational problems into statistical estimation ones, we begin to realize these gains. As a concrete example, we focus on the medoid computation problem as presented in Bagaria et. al 2017, and show that by preserving the structure of the computational problem when converting it into a statistical estimation one we are able to realize order of magnitude gains over previous works both theoretically and empirically.

Polar Codes for Simultaneous Information and Energy Transmission
Sourya Basu (University of Illinois at Urbana-Champaign)

Abstract: We develop practical codes for achieving the capacity-energy function in simultaneously transmitting information and energy over a single noisy line. In particular, two polar coding techniques are developed for discrete memoryless channels. One technique involves concatenating nonlinear mappings with linear polar codes, whereas the other involves using randomized rounding to achieve the capacity-energy function. Further, capacity-energy achieving polar coding techniques for the additive white Gaussian noise (AWGN) channel with peak power constraint are also provided.

Stochastic Gradient Coding for Flexible Straggler Mitigation in Distributed Learning

Abstract: We consider distributed gradient descent in the presence of stragglers. Recent work on gradient coding and approximate gradient coding have shown how to add redundancy in distributed gradient descent to guarantee convergence even if some workers are slow or non-responsive. In this work we propose a new type of approximate gradient coding which we call Stochastic Gradient Coding (SGC). The idea of SGC is very simple: we distribute data points redundantly to workers according to a good combinatorial design. We prove that the convergence rate of SGC mirrors that of batched Stochastic Gradient Descent (SGD) for the $$\ell_2$$ loss function, and show how the convergence rate can improve with the redundancy. We also provide bounds for more general convex loss functions. We show empirically that SGC requires a small amount of redundancy to handle a large number of stragglers and that it can outperform existing approximate gradient codes when the number of stragglers is large.

Tensorization of the strong data processing inequality for quantum chi-square divergences
Yu Cao (Duke University)

Abstract: Quantifying the concentration of classical and quantum states under noisy channels is an important topic in the information theory. Among various techniques, the strong data processing inequality, as a refinement of the well-known data processing inequality, has lately received much attention for classical noisy channels. In this work, we apply the strong data processing inequality to study quantum noisy channels and under certain assumptions, we prove the tensorization of the strong data processing inequality for a family of quantum chi-square divergences. In addition, we discuss the connection between the quantum strong data processing inequality constant and the quantum maximal correlation.

Benefits of Coding on Age of Information in Broadcast Networks
Xingran Chen (University of Pennslyvania)

Abstract: Age of Information (AoI) is studied in two-user broadcast networks with feedback, and lower and upper bounds are derived on the expected weighted sum AoI of the users. In particular, a class of simple coding actions is considered and within this class, randomized and deterministic policies are devised. Explicit conditions are found for symmetric dependent channels under which coded randomized policies strictly outperform the corresponding uncoded policies. Similar behavior is shown numerically for deterministic policies.

Outer Bounds in Distributed Storage
Anna Chlopecki (University of Illinois)

Abstract: In the exact repair setting for error correcting codes over distributed storage networks, we must consider fundamental trade-offs between various code parameters. Layered codes realize optimal trade-offs between storage overhead and repair bandwidth in a special range of code parameters. Optimality for general codes remains an open problem. We test how known proofs for optimality hold up for codes obtained by concatenation of layered codes.

Secure Distributed Cloud Storage
Alejandro Cohen (MIT)

Abstract: The principal mission of multi-source multi-cast (MSM) is to disseminate all messages from all sources in a network to all destinations. In many applications, securing the messages disseminated is critical. A common secure model is to consider a network where there is an eavesdropper which is able to observe a subset of the network links, and seeks a code which keeps the eavesdropper ignorant regarding all the messages. While this is solved when all messages are located at a single source, secure MSM (SMSM) is an open problem, and the rates required are hard to characterize in general. In this paper, we consider individual security, which promises that the eavesdropper has zero mutual information with sub sets of messages. We completely characterize the rate region for SMSM under individual security, and show that such a security level is achievable at the full capacity of the network, that is, the cut-set bound is the matching converse, similar to non-secure MSM.

Iterative Threshold Decoding of Braided Convolutional Codes
Andrew Cummins (NMSU)

Abstract: Braided convolutional codes (BCCs) are a class of spatially coupled, turbo-like codes that utilize short constraint length convolutional component codes and can be decoded by iterative methods based on the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm. Previous work has employed this technique to demonstrate both a sliding-window and pipeline decoding scheme which offer near capacity performance and reasonable decoding complexity. We now present preliminary work on an iterative sliding-window type BCC decoder based on APP soft-decision component decoders with the aim of greatly reducing decoding complexity.

GASP Codes for Secure Distributed Matrix Multiplication
Rafael D'Oliveira (Rutgers University)

Abstract: We consider the problem of secure distributed matrix multiplication (SDMM) in which a user wishes to compute the product of two matrices with the assistance of honest but curious servers. We construct polynomial codes for SDMM by studying a combinatorial problem on a special type of addition table, which we call the degree table. The codes are based on arithmetic progressions, and are thus named GASP (Gap Additive Secure Polynomial) Codes. GASP Codes are shown to outperform all previously known polynomial codes for secure distributed matrix multiplication in terms of download rate.

Distributed Matrix-Vector Multiplication: A Convolutional Coding Approach
Anindya Bijoy Das (Iowa State University)

Abstract: Distributed computing systems are well-known to suffer from the problem of slow or failed nodes; these are referred to as stragglers. Straggler mitigation (for distributed matrix computations) has recently been investigated from the standpoint of erasure coding in several works. We develop a strategy for distributed matrix-vector multiplication based on convolutional coding. Our scheme can be decoded using a low-complexity peeling decoder. The recovery process enjoys excellent numerical stability as compared to Reed-Solomon coding based approaches (which exhibit significant problems owing their badly conditioned decoding matrices). Finally, our schemes are better matched to the practically important case of sparse matrix-vector multiplication as compared to many previous schemes. Extensive simulation results corroborate our findings.

Sequential Controlled Sensing for Composite Multihypothesis Testing
Aditya Deshmukh (University of Illinois at Urbana-Champaign)

Abstract: The problem of multi-hypothesis testing with controlled sensing of observations is considered. The distribution of observations collected under each control is assumed to follow a single-parameter exponential family distribution. The goal is to design a policy to find the true hypothesis with minimum expected delay while ensuring that probability of error is below a given constraint. The decision maker can control the delay by intelligently choosing the control for observation collection in each time slot. We derive a policy that satisfies the given constraint on the error probability. We also show that the policy is asymptotically optimal in the sense that it asymptotically achieves an information-theoretic lower bound on the expected delay.

Optimizing Data Shuffling for Distributed Machine Learning

Abstract: In distributed machine learning algorithms, data shuffling is considered before the training step to enhance the convergence performance of stochastic gradient descent. The benefits of data shuffling, however, come at a price since the entire data-set is communicated to all worker nodes. We explore the idea of how coding theory can be incorporated into the context of distributed machine learning to alleviate the communication overhead, while ensuring the statistical efficiency of machine learning algorithms, from an information-theoretic perspective. We characterize the rate-memory trade-off and propose a novel deterministic linear coded shuffling scheme, which improves the state of the art, by exploiting the cache memories to create coded functions that can be simultaneously decoded by several worker nodes. Simulation results corroborate the merits of the proposed scheme over the existing ones.

Numerically Stable Polynomially Coded Computing

Abstract: We study the numerical stability of polynomial based encoding methods, which has emerged to be a powerful class of techniques for providing straggler and fault tolerance in the area of distributed coded computing. We present new numerically stable techniques for coded matrix multiplication and empirically demonstrate that our constructions have significantly lower numerical errors compared to previous approaches which involve inversion of Vandermonde matrices.

Channel Coding at Low Capacity

Abstract: Low-capacity scenarios have become increasingly important in the technology of Internet of Things (IoT) and the next generation of mobile networks. Such scenarios require efficient and reliable transmission of information over channels with an extremely small capacity. Within these constraints, the performance of state-of-the-art coding techniques is far from optimal in terms of either rate or complexity. Moreover, the current non-asymptotic laws of optimal channel coding provide inaccurate predictions for coding in the low-capacity regime. In this paper, we provide the first comprehensive study of channel coding in the low-capacity regime. We will investigate the fundamental non-asymptotic limits for channel coding as well as challenges that must be overcome for efficient code design in low-capacity scenarios.

Testing Communities in Stochastic Block Models

Abstract: We study the minimax theory of goodness-of-fit and two-sample testing of $$s$$ changes in symmetric, 2-community stochastic block models (SBMs) on $$n$$ vertices in the commonly studied regime where the intra- and inter-community expected degrees are $$O(1)$$ and are comparable, and the communities are equally sized. Of particular interest is the regime where the number of changes to be tested, $$s$$, is $$o(n)$$. Given the mature theory of recovery of communities in an SBM, the key question is if these testing problems can be solved in regimes where recovery up to $$O(s)$$ errors is impossible. We find that the setting decomposes into ‘small’ -$$s \ll \sqrt{n}$$- and ‘large’ -$$s \gg \sqrt{n}$$- changes. Testing of large changes can be reliably done even when natural notions of SNR are $$O(1)$$ - which renders recovery at sublinear distortion impossible. For small changes, we show that the naive scheme of reconstructing communities and comparing them is minimax up to constants in its SNR requirements.

Achievable Information-Theoretic Secrecy in the Presence of a Radar
Bo Guan (University of Massachusetts Amherst)

Cross-iteration coded computing

Abstract: We introduce the idea of cross-iteration coded computing, an approach to reducing communication costs for a large class of distributed iterative algorithms involving linear operations, including gradient descent and accelerated gradient descent for quadratic loss functions. The state-of-the-art approach for these iterative algorithms involves performing one iteration of the algorithm per round of communication among the nodes. In contrast, our approach performs multiple iterations of the underlying algorithm in a single round of communication by incorporating some redundancy storage and computation. Our algorithm works in the master-worker setting with the workers storing carefully constructed linear transformations of input matrices and using these matrices in an iterative algorithm, with the master node inverting the effect of these linear transformations.

The Role of Side Information in Single-Server Private Information Retrieval

Abstract: This work summarizes our recent results in the information-theoretic single-server Private Information Retrieval (PIR) in the presence of side information. In this problem, there is a single server storing a database of messages, and there is a user which has a side information about the messages in the database, in the uncoded or coded form. The user wishes to retrieve a single or multiple uncoded messages from the server privately. The problem is to minimize the download cost of the user from the server, while protecting from the server the identities of the demanded messages (jointly or individually), or the identities of both the demanded messages (jointly) and the messages in the user's side information. For each setting of the PIR with side information problem, we have shown how much, in the presence of side information, the download cost can be reduced, when compared to the case with no side information.

Robust Beam-Alignment for Millimeter Wave Networks
Muddassar Hussain (Purdue University)

Abstract: Millimeter-wave communications rely on narrow-beam transmissions to cope with the strong signal attenuation at these frequencies, thus demanding precise alignment between transmitter and receiver. However, the beam-alignment procedure may entail a huge overhead and its performance may be degraded by detection errors. This paper proposes a coded energy-efficient beam-alignment scheme, robust against detection errors. Specifically, the beam-alignment sequence is designed such that the error-free feedback sequences are generated from a codebook with the desired error correction capabilities. Therefore, in the presence of detection errors, the error-free feedback sequences can be recovered with high probability. The assignment of beams to codewords is designed to optimize energy efficiency, and a water-filling solution is proved. The numerical results with analog beams depict up to 4dB and 8dB gains over exhaustive and uncoded beam-alignment schemes, respectively.

Coding for Deletion and Insertion Channels
Md Shoriful Islam (BYU)

Abstract: Deletion and insertion channels are a special case of channels that accounts for synchronization errors. These errors may be caused by a mismatch between transmitter and receiver during communication over the deletion or insertion channels which may cause respectively a symbol to be removed or added at any location in the data stream. Some significant work has been done on deletion and insertion channels in the last few years, however there are many open problems. Most of the existing literatures in this area is focused on bounding the capacity. Also no efficient code construction that can correct for more than one bit deletion or insertion known today. In this poster, we consider existing works and potential directions for coding over deletion and insertion channels. We offer a through discussion of the state-of-the-art, and consider new ideas for coding over the sticky channel, a special case of the insertion channel.

Rank Aggregation via Heterogeneous Thurstone Models
Tao Jin (University of Virginia)

Abstract: Aggregating ranked data has a wide range of applications in recommender systems, social choice, and bioinformatics. In this problem, the input data commonly has the form of pairwise comparisons or partial rankings of subsets of items and the goal is to aggregate these into a single ranking of all items. Such data is usually collected from a group of users or annotators with differing levels of expertise. Conventional data models, however, either assume a single user or a homogeneous population of users with similar quality. In this paper, we propose a Heterogeneous Thurstone Model (HTM), which can take the accuracy levels of different users into account. By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's original framework, and as such, also extends the Bradley-Terry-Luce (BTL) model for pairwise comparisons to heterogeneous populations of users. Under this framework, we also propose a rank aggregation algorithm based on SGD.

User Localization in MmWave Cells: A Non-Adaptive Quantitative Group Testing Approach based on Sparse Graph Codes
Esmaeil Karimi (Texas A&M)

Abstract: This poster considers the problem of quantitative group testing with a non-adaptive testing strategy. There are two models for the defective items: (i) deterministic, and (ii) randomized. In the deterministic model, the exact number of the defective items, $$K$$, is known, whereas in the randomized model each item is defective with probability $$\frac{K}{N}$$, independent of the other items, where $$N$$ is the total number of items. In this work, we propose a non-adaptive quantitative group testing algorithm using sparse graph codes over bi-regular bipartite graphs with left-degree $$\ell$$ and right degree $$r$$ and binary $$t$$-error-correcting BCH codes. We show that for both the deterministic and randomized models, our algorithm requires at most $${m=c(t)K(t\log(\frac{\ell N}{c(t)K}+1)+1)+1}$$ tests to recover all the defective items with probability approaching one (as $$N$$ and $$K$$ grow unbounded), where $$c(t)$$ is a constant that depends only on $$t$$.

Single-Server Single-Message Online Private Information Retrieval with Side Information

Abstract: In many practical settings, the user needs to retrieve information from a server in a periodic manner, over multiple rounds of communication. In this work, we discuss the setting in which this information needs to be retrieved privately, such that the identity of all the information retrieved until the current round is protected. We refer to this setting as an emph{online private information retrieval} as the user does not know the identities of the future items that need to be retrieved from the server.

we assume that the user knows a random subset of $$M$$ messages in the database as a side information which are unknown to the server. Focusing on scalar-linear settings, we characterize the emph{per-round capacity}, i.e., the maximum achievable download rate at each round, and present a coding scheme that achieves this capacity. The key idea of our scheme is to utilize the data downloaded during the current round as a side information for the subsequent rounds.

Leveraging Coding Techniques for Speeding up Distributed Computing
Konstantinos Konstantinidis (Iowa State University)

Abstract: Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominates the overall job execution time. Recent work has demonstrated schemes where the communication load in the shuffle phase can be traded off for the computation load in the map phase. In this work, we focus on a class of distributed algorithms, broadly used in deep learning, where intermediate computations of the same task can be combined. Even though prior techniques reduce the communication load significantly, they require a number of jobs that grows exponentially in the system parameters. This limitation is crucial and may diminish the load gains as the algorithm scales. We propose a new scheme which achieves the same load as the state-of-the-art while ensuring that the number of jobs as well as the number of subfiles that the data set needs to be split into remain small. We can obtain over 4.69X speedup over the uncoded approach and more than 2.6X over current state of the art.

Outer Bounds in Distributed Storage

Abstract: In the exact repair setting for error correcting codes over distributed storage networks, we must consider fundamental trade-offs between various code parameters. Layered codes realize optimal trade-offs between storage overhead and repair bandwidth in a special range of code parameters. Optimality for general codes remains an open problem. We test how known proofs for optimality hold up for codes obtained by concatenation of layered codes.

A Family of Bayesian Cramer-Rao Bounds
Kuan-Yun Lee (University of California, Berkeley)

Abstract: Under minimal regularity assumptions, we establish a family of information-theoretic Bayesian Cramer-Rao bounds, indexed by probability measures that satisfy a logarithmic Sobolev inequality. This family includes as a special case the known Bayesian Cramér-Rao bound (or van Trees inequality), and its less widely known entropic improvement due to Efroimovich. For the setting of a log-concave prior, we obtain a Bayesian Cramér-Rao bound which holds for any (possibly biased) estimator and, unlike the van Trees inequality, does not depend on the Fisher information of the prior.

Covert Communications on Continuous-Time Channels
Ke Li (University of Massachusetts Amherst)

Abstract: Covert communication is the transmission of information from a transmitter (Alice) to a legitimate receiver (Bob) where the presence of the message is hidden from an attentive warden (Willie). Recent work has been establishing the limits of such covert communication, including our group's work showing that a jammer can significantly improve the covert throughput. However, our work also demonstrates that over continuous-time channels, different timing offsets between Alice's signal and that of the jammer allow for application of co-channel interference mitigation techniques at Willie's detector and thus, leads to a result similar to that of the case without a jammer. In this paper, we show that the timing difference can enhance noise, hence thwart Willie's detection. We relate the timing difference to the exact scaling constant of the limits of the covert communication system.

Data collection on Johnson graphs
Xiao Li (University of Illinois)

Abstract: We construct codes on Johnson graphs that in combination with layered codes produce regenerating codes with efficient data collection from a small number of nodes. Our concatenated layered code construction recovers as known codes improved layered codes and cascade codes, and produces new codes with low sub-packetization levels.

Scalable Automated Proving of Information Theoretic Inequalities with Proximal Algorithms
Lin Ling (City University of Hong Kong)

Abstract: Proving or disproving linear information theoretic inequalities is a fundamental task in information theory, but for inequalities involving more than a few random variables, manual proving can often be tedious or even intractable. In 2014, a framework has been proposed by S.W. Ho et al. to construct analytical proofs and disproofs for these inequalities. In this paper, we further extend the framework by reformulating the LPs and applying Alternating Direction Method of Multipliers (ADMM), where all the subproblems have closed-form solutions and thus can be solved efficiently. The proposed algorithm is also parallelizable so the performance can be further improved by running on it a GPU. An online web service is developed to allow users to prove or disprove their inequalities without installing any software package or dependency.

Recursive inference on tree with limited memory: a Wasserstein approach
Jingbo Liu (MIT)

Abstract: Reconstruction on tree is a deceptively simple mathematical model that underlies the study of various objects including community detection, belief propagation algorithms, deep networks, and bioinformatics. We consider the case where the message is recursively reconstructed by processing units with limited memory, and show that as the noise tends to the classical Ketsen-Stigum threshold, the memory in a recursive reconstruction algorithm must tend to infinity. This proves a conjecture in the Kenyon, Peres, Schulman 2000 paper. The proof follows from a new strategy of tracking the evolution of the likelihood function in the Wasserstein space, and appropriately applying the Wasserstein central limit theorem. (With Jain, Koehler, and Mossel).

Evolution of $$N$$-gram Frequencies under Duplication and Substitution Mutations
Hao Lou (University of Virginia)

Abstract: The driving force behind the generation of biological sequences are genomic mutations that shape these sequences throughout their evolutionary history. An understanding of the statistical properties that result from mutation processes is of value in a variety of tasks related to biological sequence data, e.g., estimation of model parameters and compression. At the same time, due to the complexity of these processes, designing tractable stochastic models and analyzing them are challenging. In this paper, we study two types of mutations, tandem duplication and substitution. These play a critical role in forming tandem repeat regions, which are common features of the genome of many organisms. We provide a stochastic model and, via stochastic approximation, study the behavior of the frequencies of $$N$$-grams in resulting sequences. Specifically, we show that $$N$$-gram frequencies converge almost surely to a set which we identify as a function of model parameters. From these frequencies, other statistics can be derived. In particular, we present a method for finding upper bounds on entropy.

Decentralized Multi Player Multi-Armed Bandits for Uncoordinated Spectrum Access
Akshayaa Magesh (University of Illinois at Urbana-Champaign)

Abstract: Multi player multi armed bandits have emerged as a good model for solving spectrum access problems in uncoordinated networks. In this work, we consider a particularly challenging case where users cannot communicate with one another and the mean reward for the same channel is different across the users. We provide a policy that achieves logarithmic regret in this setting.

Rateless Codes for Distributed Computations with Sparse Compressed Matrices
Ankur Mallick (Carnegie Mellon University)

Abstract: Unpredictable slowdown, or straggling, of worker nodes is a major bottleneck when performing large matrix-vector multiplication in a distributed fashion. The problem is compounded if the matrix is sparse due to irregularities in the distribution of non-zero elements, which leads to load imbalance across nodes. To mitigate this, we encode the matrix rows using rateless codes and distribute them across workers so that partial work done by worker nodes can be utilized and tail latency is eliminated. We also propose a balanced row-allocation strategy to ensure that equal amount of non-zero matrix entries are assigned to each worker. The entire scheme is designed to work with compressed, memory-efficient formats like CSR/CSC that are used to store sparse matrices in practice. Theoretical analysis and simulations show that our balanced rateless coding strategy achieves significantly lower overall latency than conventional sparse matrix-vector multiplication strategies.

Service Rate Regions in Distributed Storage Systems
Carolyn Mayer (Worcester Polytechnic Institute)

Abstract: In a distributed storage system, files may be stored across several servers or nodes. Redundancy is useful both to increase tolerance to server failures and for quick content retrieval. We consider a storage system in which K files are stored across N>K nodes. In our model, a node may either be systematic or coded. A file can be accessed either through a single systematic node or through a combination of coded and systematic nodes. Given a specified rate at which each node can resolve requests for a file, we investigate the region of requests that can be served by the storage system.

The Geometry of Community Detection
Vaishakhi Mayya (Duke University)

Abstract: The problem of community detection is to partition a network into clusters of nodes (communities) with similar connection patterns. Specific examples include finding like-minded people in a social network and discovering the hierarchical relationships in organizations from observed behavior.

The contribution of this work is to study a broad class of network models in which there can be high variability in the sizes and behaviors of the different communities. Our analysis shows that the performance in these models can be described in terms of a matrix of the effective signal-to-noise ratios (SNRs) that provides a geometrical representation of relationships between the communities. This analysis motivates new methodology for a variety of state-of-the-art algorithms, including spectral clustering, belief propagation, and approximate message passing.

Entropy concavity deficits
James Melbourne (University of Minnesota)

Abstract: Motivated by an application in the thermodynamic limits of computational efficiency, we investigate the concavity deficit of the entropy functional. Some properties of the skew-divergence are developed alongside a ‘‘skew’’ $$\chi$$-squre divergence and connected with a classical version of the Holevo Information to derive upperbounds on the concavity deficit. Complementary lower bounds are derived with special attention paid to the case that corresponds to independent summation. Some discussion of this deficit's connection to mutual information and additive channels is given as well.

Context Aware Laplacian Mechanism for Local Information Privacy
Mohamed Mohamed (University of Arizona)

Abstract: In this poster, we discuss the approximate case of local information privacy and its relationship to the approxi- mate local differential privacy. Aslo, we devise additive noise mechanisms for data aggregation which satisfy local information privacy constraints. There has been significant prior work on devising additive noise mechanisms which satisfy differential privacy (e.g., Laplacian mechanism and Gaussian mechanism). However, for the notion of information privacy, which accounts for prior-knowledge about the data, there are no existing general purpose additive noise mechanisms. To this end, we devise a prior-aware Laplacian noise mechanism, which satisfies local information privacy. We show that adding context awareness (i.e., via the knowledge of prior of the data) improves the tradeoff between utility and privacy when compared to context-unaware mechanisms.

On-Off Privacy with Correlated Requests
Carolina Naim (Rutgers University)

Abstract: We introduce the ON-OFF privacy problem. At each time, the user is interested in the latest message of one of $$N$$ sources chosen at random, and his privacy status can be ON or OFF for each request. Only when privacy is ON the user wants to hide the source he is interested in. The problem is to design ON-OFF privacy schemes with maximum download rate that allow the user to obtain privately his requested messages. In many realistic scenarios, the user’s requests are correlated since they depend on his personal attributes such as age, gender, political views, or geographical location. Hence, even when privacy is OFF, he cannot simply reveal his request since this will leak information about his requests when privacy was ON. We study the case when the users’s requests can be modeled by a Markov chain and $$N = 2$$ sources. In this case, we propose an ON-OFF privacy scheme and prove its optimality.

Ternary Quantized Polar Code Decoders: Analysis and Design
Joachim Neu (Stanford University)

On the Optimal Delay Amplification Factor of Multi-Hop Relay Channels
Dennis Ogbe (Purdue University)

Abstract: The abstract model of the multi-hop relay channel is fundamental to a vast variety of modern communication systems. This fact, coupled with the demand for ultra-reliable-low-latency communication (URLLC), motivates a new investigation of relay channels from a delay-vs-throughput perspective. This work seeks to analyze this tradeoff in the regime of asymptotically large, yet still finite delay. A new metric called the “Delay Amplification Factor” (DAF) is introduced, which allows analytic comparison of the asymptotic delay across different relay solutions, e.g. decode-&-forward (DF), compress-&-forward, etc. The optimal DAF (over all possible existing/future designs) is then characterized for two special settings, one with fixed-length coding and one with variable-length coding and 1-bit stop feedback.

Analysis of Variance Models through Information Theory
Chathurangi Pathiravasan (Southern Illinois University)

Abstract: Information theory is a mathematical study of coding along with the quantification, storage, and communication of information. It is widely used in many scientific fields, such as statistics, artificial intelligence and statistical physics. It is a vital part of probability theory, with deep connections to statistical inference in terms of relative entropy or Kullback-Leibler (KL) divergence which measure the difference between two probability distributions. In this study, we have shown the minimization problem with KL divergence plays a key role in one-way Analysis of Variance (ANOVA) when comparing means of different populations. A new semi-parametric approach is introduced which relax assumptions in ANOVA. The method is demonstrated on meteorological radar data, credit limit data and simulated data. Asymptotic properties of the proposed estimators are established in order to test the hypothesis for equality of distributions.

Embedded Index Coding
Alexandra Porter (Stanford University)

Abstract: Motivated by applications in distributed storage and distributed computation, we introduce embedded index coding (EIC). EIC is a type of distributed index coding in which nodes in a distributed system act as both senders and receivers of information. We show how embedded index coding is related to index coding in general, and give characterizations and bounds on the communication costs of optimal embedded index codes. We also define task-based EIC, in which each sending node encodes and sends data blocks independently of the other nodes. Task-based EIC is more computationally tractable and has advantages in applications such as distributed storage, in which senders may complete their broadcasts at different times. Finally, we give heuristic algorithms for approximating optimal embedded index codes, and demonstrate empirically that these algorithms perform well.

CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed Learning

Abstract: We focus on the synchronous Gradient Descent paradigm for large-scale distributed learning, for which there has been a growing interest to develop efficient and robust gradient aggregation strategies that overcome two key bottlenecks: communication bandwidth and stragglers’ delays. In particular, Ring-AllReduce (RAR) and Gradient Coding (GC) designs have been proposed to respectively avoid bandwidth bottleneck and mitigate stragglers. We propose a joint communication topology design and data set allocation strategy, named CodedReduce (CR), that combines the best of both RAR and GC. That is, it parallelizes the communications over a tree topology and carefully designs a redundant data set allocation and coding strategy at the nodes. In addition to theoretical optimality guarantees, we empirically evaluate the performance of our proposed CR design over Amazon EC2 and demonstrate that it achieves speedups of up to 18.9 x and 7.9 x, respectively over the benchmarks GC and RAR.

On the Bias of Directed Information Estimators
Gabe Schamberg (UC San Diego)

Abstract: When estimating the directed information between two jointly stationary Markov processes, it is typically assumed that the recipient of the directed information is itself Markov of the same order as the joint process. While this assumption is often made explicit in the presentation of such estimators, a characterization of when we can expect the assumption to hold is lacking. Using the concept of d-separation from Bayesian networks, we present sufficient conditions for which this assumption holds. We further show that the set of parameters for which the condition is not also necessary has Lebesgue measure zero. Given the strictness of these conditions, we introduce a notion of partial directed information, which can be used to bound the bias of directed information estimates when the directed information recipient is not itself Markov. Lastly we estimate this bound on simulations in a variety of settings to assess the extent to which the bias should be cause for concern.

Secure Coding for Caching Channels
Morteza Shoushtari (Brigham Young University)

Abstract: Internet data traffic has increased dramatically in the last decade. To support this massive volume of data traffic will require an exorbitant amount of bandwidth and better network service quality. Caching is an efficient solution. By caching contents somewhere close to users in off-peak times, traffic can be decreased during peak times. Due to the broadcast nature, much as the transmitted data over wireless channels, communications are vulnerable to eavesdropping. Likewise, the storing sensitive data in caches presents additional security issues. With the many security challenges, there is a need for comprehensive data security solutions. In this research, we will address this problem through coded caching over a wiretap variant of the caching channel. We seek specific coding constructions that can (1) deliver un-cached data reliably to network users in an efficient manner, (2) keep data information theoretically secure against eavesdropping and other nodes in the network.

Distributed Multi-User Secret Sharing
Mahdi Soleymani (University of Michigan)

Abstract: We consider a distributed secret sharing system that consists of a dealer, $$n$$ storage nodes, and $$m$$ users. Each user is given access to a certain subset of storage nodes, where it can download the stored data. The dealer wants to securely convey a specific secret $$s_j$$ to user $$j$$ via storage nodes, for $$j=1,2,\dots,m$$, in such a way that no user gets any information about other users’ secrets in an information-theoretic sense. Given a certain number of storage nodes we find the maximum number of users that can be served in such systems and construct schemes that achieve this. Lower bounds on minimum communication complexity and storage overhead are characterized given any $$n$$ and $$m$$. Furthermore, we construct distributed secret sharing protocols, under certain conditions on the system parameters, that simultaneously attain these lower bounds, thereby providing schemes that are optimal in terms of both the communication complexity and the storage overhead.

Spike-Based Winner-Take-All Computation: Fundamental Limits and Order-Optimal Circuits
Lili Su (MIT)

Abstract: Winner-Take-All (WTA) refers to the neural operation that selects a (typically small) group of neurons from a large neuron pool. It is conjectured to underlie many of the brain's fundamental computational abilities. In this work, we consider a spike-based $$k$$–WTA model wherein $$n$$ randomly generated input spike trains compete with each other based on their underlying statistics, and $$k$$ winners are supposed to be selected. We focus on analytically characterizing the trade-off between decision accuracy and decision time. We first derive an information-theoretic lower bound on the decision time. We show that to have a (minimax) decision error $$\le \delta$$ (where $$\delta \in (0,1)$$), the computation time of any WTA circuit is at least $$((1-\delta) \log(k(n -k)+1) -1)T_{\mathcal{R}}$$, where $$T_{\mathcal{R}}$$ is a difficulty parameter of a WTA task that is independent of $$\delta$$, $$n$$, and $$k$$. We then design a simple WTA circuit whose decision time is order-optimal.

A Tunable Loss Function for Binary Classification
Tyler Sypherd (Arizona State University)

Abstract: We present alpha-loss which is a tunable loss function for binary classification that bridges log-loss and $$0-1$$ loss. We prove that alpha-loss has an equivalent margin-based form and is classification-calibrated, two desirable properties for a good surrogate loss function for the ideal yet intractable $$0-1$$ loss. For logistic regression-based classification, we provide an upper bound on the difference between the empirical and expected risk for alpha-loss at the critical points of the empirical risk by exploiting its Lipschitzianity along with recent results on the landscape features of empirical risk functions. Finally, we show that $$\alpha$$-loss with $$\alpha = 2$$ performs better than log-loss on MNIST for logistic regression.

Leveraging Prior Knowledge Asymmetries in the Design of IoT Privacy-Preserving Mechanisms
Nazanin Takbiri (University of Massachusetts)

Abstract: The Internet of Things (IoT) promises to improve user utility by tuning applications to user behaviour, but the revealing of the characteristics of a user's behaviour presents a significant privacy risk. Recently, the technique of remapping has been introduced in the privacy literature. Remapping exploits asymmetries in knowledge and/or sophistication between the intended application and the adversary. Here, after introducing the system model, we first demonstrate the mechanism behind the remapping technique. Next, we characterize the loss in privacy when the user lacks knowledge of the accuracy of the adversary's statistical model; this loss in privacy occurs both because the adversary obtains a more accurate view of the user data than expected and because the adversary can exploit the remapping to improve their statistical model more than would have been possible when remapping is not employed. Finally, we introduce a random remapping approach as a countermeasure.

Single-Error Detection and Correction for Duplication and Substitution Channels
Yuanyuan Tang (University of Virginia)

Abstract: Motivated by mutation processes occurring in in-vivo DNA-storage applications, a channel that mutates stored strings by duplicating substrings as well as substituting symbols is studied. Two models of such a channel are considered: one in which the substitutions occur only within the duplicated substrings, and one in which the location of substitutions is unrestricted. Both error-detecting and error-correcting codes are constructed, which can handle correctly any number of tandem duplications of a fixed length $$k$$, and at most a single substitution.

Model-Agnostic Private Learning
Om Dipakbhai Thakkar (Boston University)

Abstract: In this work, we design differentially private learning algorithms that are agnostic to the learning model assuming access to a limited amount of unlabeled public data. First, we provide a new differentially private algorithm for answering a sequence of online classification queries based on a private training set. We show that our algorithm makes a conservative use of the privacy budget. Next, we apply the knowledge transfer technique to construct a private learner that outputs a classifier, which can be used to answer an unlimited number of queries. In the (agnostic) PAC model, we analyze our construction and prove upper bounds on the sample complexity for both the realizable and the non-realizable cases. Similar to non-private sample complexity, our bounds are completely characterized by the VC dimension of the concept class.

How Should we Define Information Flow in Neuroscience?
Praveen Venkatesh (Carnegie Mellon University)

Abstract: We propose a formal, systematic framework for examining information flow in the brain. We develop a graphical model of the underlying computational circuit, comprising nodes that represent neurons or groups of neurons, which are interconnected to reflect anatomy. Using this model, we provide a definition for information flow based on conditional mutual information between the stimulus and the transmissions of neurons. Our definition organically emphasizes what the information is about: typically, the stimulus or response of a specific neuroscientific task. The framework we develop provides guarantees that were hitherto unattainable using tools such as Granger Causality and Directed Information, partly because they lacked a theoretical foundation grounded in neuroscience. Specifically, if the “output” of the computational system shows stimulus-dependence, then there exists an “information path” leading from the input to the output, along which stimulus-dependent information flows.

Decentralized Equalizer Construction for Large Intelligent Surfaces
Juan Vidal Alegría (Lund University)

Abstract: In this poster we present fully decentralized methods for calculating an approximate zero-forzing (ZF) equalizer in a large intelligent surface (LIS). A LIS is intended for wireless communication and facilitates unprecedented MU-MIMO performance, far superior to that of Massive MIMO. Antenna modules in the grid connect to their neighbors to exchange messages of information needed for interference cancellation in a fullydecentralized fashion, making the system scalable. By a careful design of how the messages are routed, we show that the proposed method is able to cancel inter-user interference sufficiently well without any centralized coordination, opening the door for the realization of this type of structures.

Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions
Hao Wang (Harvard University)

Abstract: When the average performance of a prediction model varies significantly with respect to a sensitive attribute, the performance disparity can be expressed in terms of the probability distributions of input and output variables for each sensitive group. We exploit this fact to explain and repair the performance disparity of a fixed classification model over a population of interest. Given a black-box classifier that performs unevenly across sensitive groups, we aim to eliminate the performance gap by perturbing the distribution of input features for the disadvantaged group. We refer to the perturbed distribution as a counterfactual distribution, and characterize its properties for popular fairness criteria. We design a descent algorithm to efficiently learn a counterfactual distribution. We use the counterfactual distribution to build a preprocessor that reduces disparate impact without training a new model. We illustrate these use cases through experiments on real-world datasets.

Two Transmission Problems over Two-Way Channels
Jian-Jia Weng (Queen's University)

Abstract: Shannon's two-way channels (TWCs) allow two users to exchange information in a full-duplex manner. While such channels are more efficient than their one-way counterparts, how each user can optimize its individual transmission over the shared channel and concurrently provide sufficient feedback to help the other user's transmission is a quite challenging problem. We present progress results for the following two fundamental TWC transmission problems: the characterization of channel capacity and the derivation of coding theorems for the lossy transmission of correlated sources over TWCs. In summary, we exploit various channel symmetry properties to determine the capacity region of TWCs. Moreover, we construct separate and joint source-channel coding schemes for transmitting correlated sources over TWCs. We also derive distortion bounds and investigate the performance of uncoded transmission. Two complete joint source-channel coding theorems are established in some scenarios.

Outer bounds in distributed storage
Weili Wu (University of Illinois)

Abstract: In the exact repair setting for error correcting codes over distributed storage networks, we must consider fundamental trade-offs between various parameters. Layered codes realize optimal trade-offs between storage overhead and repair bandwidth in a special range of code parameters. Optimality for general codes remains an open problem. We test how known proofs for optimality hold up for codes obtained by concatenation of layered codes.

Bounding the Number of Mass Points of the Capacity-Achieving Input for the Amplitude and Power Constrained Additive Gaussian Channel
Semih Yagli (Princeton University)

Abstract: We study the real and complex Additive Gaussian Channels (AGCs) with input amplitude constraints. For the real AGC model, it is well known that the capacity-achieving input distribution is discrete with ﬁnitely many mass points. Similarly, for the complex AGC model, it is well known that the amplitude of the capacity-achieving input has a distribution that is discrete with ﬁnitely many mass points. However, due to the previous proof technique, neither the exact numbers of mass points of the optimal input distributions in these settings nor bounds on them were available. We provide an alternative proof of the discreteness of the capacity-achieving input distributions and produce the ﬁrst ﬁrm upper bounds on the number of mass points, paving an alternative way for approaching many such problems.

On the Most Informative Boolean Functions of the Very Noisy Channel
Hengjie Yang (UCLA)

Abstract: Let $$X^n$$ be a uniformly distributed $$n$$-dimensional binary vector, and $$Y^n$$ is the result of passing $$X^n$$ through a binary symmetric channel (BSC) with crossover probability $$\alpha$$. A recent conjecture postulated by Courtade and Kumar states that $$I(f(X^n);Y^n)\le 1-H(\alpha)$$. Although the conjecture has been proved to be true in a dimension-free high noise regime by Samorodnitsky, here we present a calculus-based approach to show a dimension-dependent result by examining the second derivative of $$H(\alpha)-H(f(X^n)|Y^n)$$ at $$\alpha=1/2$$. In this way, one can clearly see that the dictator function is the most informative function in high noise regime.

Effects of Semantic Diversity on Majority Judgement
Alan Yang (University of Illinois at Urbana-Champaign)

Abstract: Majority judgement is a social ranking method that was proposed to overcome shortcomings of traditional voting systems. Polling voters for an ordered ranking of a set of candidates leads to potentially inconsistent results. Instead, judges assign discrete grades to candidates, and median grades are used to produce a final ordering. We treat the method as a parameter estimation problem of the underlying candidate ordering (assumed to exist). Grades are assigned following a distribution parameterized by the true ordering and are interpreted as quantized observations of a candidate’s utility. Semantic diversity among judges is modeled by treating quantization thresholds as random; interpretations of grading criteria vary between judges. This setup is used to understand when diversity can improve the performance of majority judgement. Indeed, semantic diversity is unavoidable in practice, and has even been experimentally shown to be beneficial in simulations.

Efficiency of k-Nearest Neighbor Entropy Estimation in Stationary Time Series
Alex Young (Harvard University)

Abstract: Entropy, particular due to its connection with mutual information, has received considerable use in the study of time series data including information flow and casualty detection. One common strategy for estimating entropy from observed data is the k-nearest neighbor (or generalized Kozachenko-Leonenko) entropy estimator, which has been thoroughly investigated in the case of independent data including results on its asymptotic bias, variance, and normality. However, less is known in the case of time series data wherein the inherent dependence precludes standard approaches to analysis. In this poster, we present theoretical results on the bias of this estimator when applied to a stationary time series which satisfies a mixing condition and emits a stationary density which adheres to regularity and moment requirements. Numerical results on the variance are also included.

Correct the Output of Social Network De-anonymization Algorithm
Liren Yu (Purdue University)

Abstract: Social network de-anonymization refers to the problem of de-anonymizing users’ identities by mapping them to correlated cross-domain networks. The output of social network de-anonymization algorithm can be interpreted as a set of partially correctly matched seeds. Taking the partially correct seeds as input, prior literature proposes various algorithms to correct the mapping based on one-hop “witnesses”. In this paper, we adapt the idea of multi-hop “witnesses” to develop a new algorithm, which corrects the errors in a set of given seeds. We theoretically prove that our algorithm needs less proportion of correct seeds than prior algorithms to correct all errors on sparse Erdos-Renyi model graphs with high probability. We also numerically confirm the effectiveness of our algorithm under different configurations.

Cache-aided interference management using hypercube combinatorial cache design
Xiang Zhang (University of Utah)

Abstract: This poster presents our result of a new cache placement approach called hypercube and its application to the wireless cache-aided interference network. It is shown that we can achieve exponentially less number of subpacketizations while still preserving the optimal one-shot linear sum DoF.