Penn Arts & Sciences Logo

AMCS Colloquium

Friday, March 17, 2023 - 1:45pm

Jiashun Jin

Carnegie Mellon University, Department of Statistics & Data Science

Location

University of Pennsylvania

DRL - A4

We discover an interesting self-normalizing cycle count (SCC) statistic for network data. As the network size grows, the SCC tends to the standard normal, despite that the network model may have numerous unknown parameters. The SCC can be used to tackle several interesting problems in network analysis. For example, we show that SCC gives rise to an optimal approach to network global testing. Especially, using Sinkhorn’s theorem, we develop a degree matching technique to characterize the minimax lower bound and derive an interesting phase transition for network global testing. The SCC also gives rise to a metric for network goodness-of-fit. Applying the metric to 11 network data sets, we show that the popular Degree-Corrected Block Model (DCBM) is frequently inadequate, and the Degree Corrected Mixed-Membership (DCMM) model is a much better fit. We also illustrate the broadness of the DCMM by a recent Non-negative Matrix Factorization (NMF) result. The SCC is especially useful in analyzing the MADStat data set (a large-scale high-quality data set on the statisticians which we collected and cleaned by ourselves): we use it to develop a metric for research diversity of statisticians, and to help build a 33-leaf community tree for the co-authorship network of statisticians.