I am an engineer, scientist, and entrepreneur working on AI, neuroscience, and robotics. My goals are to understand how the brain works and to build Artificial General Intelligence (AGI). I'm now at DeepMind via acquisition of Vicarious.
I like building teams to pursue challenging goals. I co-founded two companies – Vicarious AI & Numenta. Vicarious was recently acquired by Alphabet: our AI+robotics business merged with Intrinsic, an Alphabet company, and our research team joined DeepMind to accelerate progress toward AGI.
Check out my works on a detailed theoretical model for thalamic and cortical microcorcuits, and place cells as sequence learners for examples of recent progress on understanding the brain.
I did my PhD at Stanford University with Dr. Bernard Widrow, a pioneer of neural networks, and co-inventor of LMS gradient descent. During my PhD, I co-founded Numenta with Jeff Hawkins and Donna Dubinsky, and co-developed the ideas behind Hierarchical Temporal Memory.
PhD in Electrical Engineering
Stanford University
MS in Electrical Engineering
Stanford University
BTech in Electrical Engineering
IIT Bombay
Fascinating and puzzling phenomena, such as landmark vector cells, splitter cells, and event-specific representations to name a few, are regularly discovered in the hippocampus. Without a unifying princi- ple that can explain these divergent observations, each experiment seemingly discovers a new anomaly or coding type. Here, we provide a unifying principle that the mental representation of space is an emergent property of latent higher-order sequence learning. Treating space as a sequence resolves myriad phenomena, and suggests that the place-field mapping methodology where sequential neuron responses are interpreted in spatial and Euclidean terms might itself be a source of anomalies. Our model, called Clone-structured Causal Graph (CSCG), uses a specific higher-order graph scaffolding to learn latent representations by mapping sensory inputs to unique contexts. Learning to compress sequential and episodic experiences using CSCGs result in the emergence of cognitive maps - mental representations of spatial and conceptual relationships in an environment that are suited for planning, introspection, consolidation, and abstraction. We demonstrate that over a dozen different hippocampal phenomena, ranging from those reported in classic experiments to the most recent ones, are succinctly and mechanistically explained by our model.
Cognitive maps are mental representations of spatial and conceptual relationships in an environment, and are critical for flexible behavior. To form these abstract maps, the hippocampus has to learn to separate or merge aliased observations appropriately in different contexts in a manner that enables generalization and efficient planning. Here we propose a specific higher-order graph structure, clone-structured cognitive graph (CSCG), which forms clones of an observation for different contexts as a representation that addresses these problems. CSCGs can be learned efficiently using a probabilistic sequence model that is inherently robust to uncertainty. We show that CSCGs can explain a variety of cognitive map phenomena such as discovering spatial relations from aliased sensations, transitive inference between disjoint episodes, and formation of transferable schemas. Learning different clones for different contexts explains the emergence of splitter cells observed in maze navigation and event-specific responses in lap-running experiments. Moreover, learning and inference dynamics of CSCGs offer a coherent explanation for disparate place cell remapping phenomena. By lifting aliased observations into a hidden space, CSCGs reveal latent modularity useful for hierarchical abstraction and planning. Altogether, CSCG provides a simple unifying framework for understanding hippocampal function, and could be a pathway for forming relational abstractions in artificial intelligence.
Understanding the information processing roles of cortical circuits is an outstanding problem in neuroscience and artificial intelligence. Theory-driven efforts will be required to tease apart the functional logic of cortical circuits from the vast amounts of experimental data on cortical connectivity and physiology. Although the theoretical setting of Bayesian inference has been suggested as a framework for understanding cortical computation, making precise and falsifiable biological mappings need models that tackle the challenge of real world tasks. Based on a recent generative model, Recursive Cortical Networks, that demonstrated excellent performance on visual task benchmarks, we derive a family of anatomically instantiated and functional cortical circuit models. Efficient inference and generalization guided the representational choices in the original computational model. The cortical circuit model is derived by systematically comparing the computational requirements of this model with known anatomical constraints. The derived model suggests precise functional roles for the feed-forward, feedback, and lateral connections observed in different laminae and columns, assigns a computational role for the path through the thalamus, predicts the interactions between blobs and inter-blobs, and offers an algorithmic explanation for the innate inter-laminar connectivity between clonal neurons within a cortical column. The model also explains several visual phenomena, including the subjective contour effect, and neon-color spreading effect, with circuit-level precision. Our work paves a new path forward in understanding the logic of cortical and thalamic circuits.
The ability of humans to quickly identify general concepts from a handful of images has proven difficult to emulate with robots. Recently, a computer architecture was developed that allows robots to mimic some aspects of this human ability by modeling concepts as cognitive programs using an instruction set of primitive cognitive functions. This allowed a robot to emulate human imagination by simulating candidate programs in a world model before generalizing to the physical world. However, this model used a naive search algorithm that required 30 minutes to discover a single concept, and became intractable for programs with more than 20 instructions. To circumvent this bottleneck, we present an algorithm that emulates the human cognitive heuristics of object factorization and sub-goaling, allowing human-level inference speed, improving accuracy, and making the output more explainable.
Cognitive maps enable us to learn the layout of environments, encode and retrieve episodic memories, and navigate vicariously for mental evaluation of options. A unifying model of cognitive maps will need to explain how the maps can be learned scalably with sensory observations that are non-unique over multiple spatial locations (aliased), retrieved efficiently in the face of uncertainty, and form the fabric of efficient hierarchical planning. We propose learning higher-order graphs – structured in a specific way that allows efficient learning, hierarchy formation, and inference – as the general principle that connects these different desiderata. We show that these graphs can be learned efficiently from experienced sequences using a cloned Hidden Markov Model (CHMM), and uncertainty-aware planning can be achieved using message-passing inference. Using diverse experimental settings, we show that CHMMs can be used to explain the emergence of context-specific representations, formation of transferable structural knowledge, transitive inference, shortcut finding in novel spaces, remapping of place cells, and hierarchical planning. Structured higher-order graph learning and probabilistic inference might provide a simple unifying framework for understanding hippocampal function, and a pathway for relational abstractions in artificial intelligence.
Query training is a a technique that lets you train graphical models using ideas from deep learning.
Hippocampus encodes cognitive maps that support episodic memories, navigation, and planning. Under-standing the commonality among those maps as well as how those maps are structured, learned from experience, and used for inference and planning is an interesting but unsolved problem. We propose higher-order graphs as the general principle and present, as a plausible model, a cloned hidden Markov model (HMM) that can learn these graphs efficiently from experienced sequences. In our experiments, we use the cloned HMM for learning spatial and abstract representations. We show that inference and planning in the learned CHMM encapsulates many of the key properties of hippocampal cells observed in rodents and humans. Cloned HMM thus provides a new frame-work for understanding hippocampal function.
Sequence learning is a vital cognitive function and has been observed in numerous brain areas. Discovering the algorithms underlying sequence learning has been a major endeavour in both neuroscience and machine learning. In earlier work we showed that by constraining the sparsity of the emission matrix of a Hidden Markov Model (HMM) in a biologically-plausible manner we are able to efficiently learn higher-order temporal dependencies and recognize contexts in noisy signals. The central basis of our model, referred to as the Cloned HMM (CHMM), is the observation that cortical neurons sharing the same receptive field properties can learn to represent unique incidences of bottom-up information within different temporal contexts. CHMMs can efficiently learn higher-order temporal dependencies, recognize long-range contexts and, unlike recurrent neural networks, are able to natively handle uncertainty. In this paper we introduce a biologically plausible CHMM learning algorithm, memorize-generalize, that can rapidly memorize sequences as they are encountered, and gradually generalize as more data is accumulated. We demonstrate that CHMMs trained with the memorize-generalize algorithm can model long-range structure in bird songs with only a slight degradation in performance compared to expectation-maximization, while still outperforming other representations.
Variable order sequence modeling is an important problem in artificial and natural intelligence. While overcomplete Hidden Markov Models (HMMs), in theory, have the capacity to represent long-term tem- poral structure, they often fail to learn and converge to local minima. We show that by constraining HMMs with a simple sparsity structure inspired by biology, we can make it learn variable order sequences efficiently. We call this model cloned HMM (CHMM) because the sparsity structure enforces that many hidden states map deterministically to the same emission state. CHMMs with over 1 billion parameters can be efficiently trained on GPUs without being severely affected by the credit diffusion problem of standard HMMs. Unlike n-grams and sequence memoizers, CHMMs can model temporal dependencies at arbitrarily long distances and recognize contexts with “holes” in them. Compared to Recurrent Neural Networks and their Long Short-Term Memory extensions (LSTMs), CHMMs are generative models that can natively deal with uncertainty. Moreover, CHMMs return a higher-order graph that represents the temporal structure of the data which can be useful for community detection, and for building hierarchical models. Our experiments show that CHMMs can beat n-grams, sequence memoizers, and LSTMs on character-level language modeling tasks. CHMMs can be a viable alternative to these methods in some tasks that require variable order sequence modeling and the handling of uncertainty.
Concepts are formalized as programs on a special computer architectrue called the Visual Cognitive Computer (VCC). By learning programs on VCC, concepts transfer from schematic inputs to real-wrold robots.
A hierarchical vision model that emphasizes the role of lateral and feedback connections and treats classification, segmentation geneeration, and occlusion-reasoning in a unified framework.
A hierarchical vision model that emphasizes the role of lateral and feedback connections and treats classification, segmentation geneeration, and occlusion-reasoning in a unified framework.
AI has seen remarkable progress in recent years, due to a switch from hand-designed shallow representations, to learned deep representations. While these methods excel with plentiful training data, they are still far from the human ability to learn concepts from just a few examples by reusing previously learned conceptual knowledge in new contexts. We argue that this gap might come from a fundamental misalignment between human and typical AI representations: while the former are grounded in rich sensorimotor expe- rience, the latter are typically passive and limited to a few modalities such as vision and text. We take a step towards closing this gap by proposing an interactive, behavior-based model that represents concepts using sensorimotor contingencies grounded in an agent’s experience. On a novel conceptual learning and benchmark suite, we demonstrate that conceptually meaningful behaviors can be learned, given supervision via training curricula.
Learning from a few examples and generalizing to markedly different situations are capabilities of human visual intelligence that are yet to be matched by leading machine learning models. By drawing inspiration from systems neuroscience, we introduce a probabilistic generative model for vision in which message-passing–based inference handles recognition, segmentation, and reasoning in a unified way. The model demonstrates excellent generalization and occlusion-reasoning capabilities and outperforms deep neural networks on a challenging scene text recognition benchmark while being 300-fold more data efficient. In addition, the model fundamentally breaks the defense of modern text-based CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) by generatively segmenting characters without CAPTCHA-specific heuristics. Our model emphasizes aspects such as data efficiency and compositionality that may be important in the path toward general artificial intelligence.
The recent adaptation of deep neural network-based methods to reinforcement learning and planning domains has yielded remarkable progress on individual tasks. Nonetheless, progress on task-to-task transfer remains limited. In pursuit of efficient and robust generalization, we introduce the Schema Network, an object-oriented generative physics simulator capable of disentangling multiple causes of events and reasoning backward through causes to achieve goals. The richly structured architecture of the Schema Network can learn the dynamics of an environment directly from data. We compare Schema Networks with Asynchronous Advantage Actor-Critic and Progressive Networks on a suite of Breakout variations, reporting results on training efficiency and zero-shot generalization, consistently demonstrating faster, more robust learning and better transfer. We argue that generalizing from limited data and learning causal relationships are essential abilities on the path toward generally intelligent systems.
We introduce the hierarchical compositional network (HCN), a directed generative model able to discover and disentangle, without supervision, the building blocks of a set of binary images. The building blocks are binary features defined hierarchically as a composition of some of the features in the layer immediately below, arranged in a particular manner. At a high level, HCN is similar to a sigmoid belief network with pooling. Inference and learning in HCN are very challenging and existing variational approximations do not work satisfactorily. A main contribution of this work is to show that both can be addressed using max-product message passing (MPMP) with a particular schedule (no EM required). Also, using MPMP as an inference engine for HCN makes new tasks simple: adding supervision information, classifying images, or performing inpainting all correspond to clamping some variables of the model to their known values and running MPMP on the rest. When used for classification, fast inference with HCN has exactly the same functional form as a convolutional neural network (CNN) with linear activations and binary weights. However, HCN's features are qualitatively very different.
Convolutional neural networks (CNNs) have shown great success in computer vision, approaching human-level performance when trained for specific tasks via application-specific loss functions. In this paper, we propose a method for augmenting and training CNNs so that their learned features are compositional. It encourages networks to form representations that disentangle objects from their surroundings and from each other, thereby promoting better generalization. Our method is agnostic to the specific details of the underlying CNN to which it is applied and can in principle be used with any CNN. As we show in our experiments, the learned representations lead to feature activations that are more localized and improve performance over non-compositional baselines in object recognition tasks.
This paper is an invited commentary on Lake et al's Behavioral and Brain Sciences article titled “Building machines that learn and think like people”. Lake et al's paper offers a timely critique on the recent accomplishments in artificial intelligence from the vantage point of human intelligence, and provides insightful suggestions about research directions for building more human-like intelligence. Since we agree with most of the points raised in that paper, we will offer a few points that are complementary
Learning from a few examples and generalizing to markedly different situations are capabilities of human visual intelligence that are yet to be matched by leading machine learning models. By drawing inspiration from systems neuroscience, we introduce a probabilistic generative model for vision in which message-passing–based inference handles recognition, segmentation, and reasoning in a unified way. The model demonstrates excellent generalization and occlusion-reasoning capabilities and outperforms deep neural networks on a challenging scene text recognition benchmark while being 300-fold more data efficient. In addition, the model fundamentally breaks the defense of modern text-based CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) by generatively segmenting characters without CAPTCHA-specific heuristics. Our model emphasizes aspects such as data efficiency and compositionality that may be important in the path toward general artificial intelligence.
The theoretical setting of hierarchical Bayesian inference is gaining acceptance as a framework for understanding cortical computation. In this paper, we describe how Bayesian belief propagation in a spatio-temporal hierarchical model, called Hierarchical Temporal Memory (HTM), can lead to a mathematical model for cortical circuits. An HTM node is abstracted using a coincidence detector and a mixture of Markov chains. Bayesian belief propagation equations for such an HTM node define a set of functional constraints for a neuronal implementation. Anatomical data provide a contrasting set of organizational constraints. The combination of these two constraints suggests a theoretically derived interpretation for many anatomical and physiological features and predicts several others. We describe the pattern recognition capabilities of HTM networks and demonstrate the application of the derived circuits for modeling the subjective contour effect. We also discuss how the theory and the circuit can be extended to explain cortical features that are not explained by the current model and describe testable predictions that can be derived from the model.
In this paper, we propose a mechanism which the neocortex may use to store sequences of patterns. Storing and recalling sequences are necessary for making predictions, recognizing time-based patterns and generating behaviour. Since these tasks are major functions of the neocortex, the ability to store and recall time-based sequences is probably a key attribute of many, if not all, cortical areas. Previously, we have proposed that the neocortex can be modelled as a hierarchy of memory regions, each of which learns and recalls sequences. This paper proposes how each region of neocortex might learn the sequences necessary for this theory. The basis of the proposal is that all the cells in a cortical column share bottom-up receptive field properties, but individual cells in a column learn to represent unique incidences of the bottom-up receptive field property within different sequences. We discuss the proposal, the biological constraints that led to it and some results modelling it.
We describe a hierarchical model of invariant visual pattern recognition in the visual cortex. In this model, the knowledge of how patterns change when objects move is learned and encapsulated in terms of high probability sequences at each level of the hierarchy. Configuration of object parts is captured by the patterns of coincident high probability sequences. This knowledge is then encoded in a highly efficient Bayesian Network structure.The learning algorithm uses a temporal stability criterion to discover object concepts and movement patterns. We show that the architecture and algorithms are biologically plausible. The large scale architecture of the system matches the large scale organization of the cortex and the micro-circuits derived from the local computations match the anatomical data on cortical circuits. The system exhibits invariance across a wide variety of transformations and is robust in the presence of noise. Moreover, the model also offers alternative explanations for various known cortical phenomena.
In this paper, we revisit the problem of in- ducing a process modelfrom time-series data. Weillustrate this task with a realistic ecosys- tem model, review an initial method for its induction, then identify three challenges that require extension of this method. These in- clude dealing with unobservable variables, finding numeric conditions on processes, and preventing the creation of models that over- fit the training data. Wedescribe responses to these challenges and present experimental evidence that they have the desired effects. After this, we show that this extended ap- proach to inductive process modeling can ex- plain and predict time-series data from bat- teries on the International Space Station. In closing, we discuss related work and consider directions for future research.