Experimental scientists face an increasingly difficult challenge: while technological advances allow for the collection of larger and higher quality datasets, computational methods to better understand and make new discoveries in the data lag behind. Existing explainable AI and interpretability methods for machine learning focus on better understanding model decisions, rather than understanding the data itself. In this work, we tackle a specific task that can aid experimental scientists in the era of big data: given a large dataset of annotated samples divided into different classes, how can we best teach human researchers what is the difference between the classes? To accomplish this, we develop a new framework combining machine teaching and generative models that generates a small set of synthetic teaching examples for each class. This set will aim to contain all the information necessary to distinguish between the classes. To validate our framework, we perform a human study in which human subjects learn how to classify various datasets using a small teaching set generated by our framework as well as several subset selection algorithms. We show that while generated samples succeed in teaching humans better than chance, subset selection methods (such as k-centers or forgettable events) succeed better in this task, suggesting that real samples might be better suited than realistic generative samples. We suggest several ideas for improving human teaching using machine learning.
Publications by Year: 2021
A cell’s phenotype is the culmination of several cellular processes through a complex network of molecular interactions that ultimately result in a unique morphological signature. Visual cell phenotyping is the characterization and quantification of these observable cellular traits in images. Recently, cellular phenotyping has undergone a massive overhaul in terms of scale, resolution, and throughput, which is attributable to advances across electronic, optical, and chemical technologies for imaging cells. Coupled with the rapid acceleration of deep learning–based computational tools, these advances have opened up new avenues for innovation across a wide variety of high-throughput cell biology applications. Here, we review applications wherein deep learning is powering the recognition, profiling, and prediction of visual phenotypes to answer important biological questions. As the complexity and scale of imaging assays increase, deep learning offers computational solutions to elucidate the details of previously unexplored cellular phenotypes.