Ravi Teja Mullapudi

I am a Final year Ph.D student working with Kayvon Fatahalian and Deva Ramanan. I am broadly interested in computer vision and high performance computing. My current work focuses on techniques and models for enabling efficient visual understanding.

I did my Masters at Indian Institute of Science, where I was advised by Uday Bondhugula. Before my masters I worked at NVIDIA and did my bachelors at IIIT Hyderabad.

Email  /  CV  /  Google Scholar  

Learning to Move with Affordance Maps

The ability to autonomously explore and navigate a physical space is a fundamental requirement for virtually any mobile autonomous agent, from household robotic vacuums to autonomous vehicles. Traditional SLAM-based approaches for exploration and navigation largely focus on leveraging scene geometry, but fail to model dynamic objects (such as other agents) or semantic constraints (such as wet floors or doorways). Learning-based RL agents are an attractive alternative because they can incorporate both semantic and geometric information, but are notoriously sample inefficient, difficult to generalize to novel settings, and are difficult to interpret. In this paper, we combine the best of both worlds with a modular approach that learns a spatial representation of a scene that is trained to be effective when coupled with traditional geometric planners. Specifically, we design an agent that learns to predict a spatial affordance map that elucidates what parts of a scene are navigable through active self-supervised experience gathering. In contrast to most simulation environments that assume a static world, we evaluate our approach in the VizDoom simulator, using large-scale randomly-generated maps containing a variety of dynamic actors and hazards. We show that learned affordance maps can be used to augment traditional approaches for both exploration and navigation, providing significant improvements in performance.

Learning to Move with Affordance Maps
William Qi, Ravi Teja Mullapudi, Saurabh Gupta, Deva Ramanan
ICLR 2020

Online model distillation for efficient video inference

High-quality computer vision models typically address the problem of understanding the general distribution of real-world images. However, most cameras observe only a very small fraction of this distribution. This offers the possibility of achieving more efficient inference by specializing compact, low-cost models to the specific distribution of frames observed by a single camera. In this paper, we employ the technique of model distillation (supervising a low-cost student model using the output of a high-cost teacher) to specialize accurate, low-cost semantic segmentation models to a target video stream. Rather than learn a specialized student model on offline data from the video stream, we train the student in an online fashion on the live video, intermittently running the teacher to provide a target for learning. Online model distillation yields semantic segmentation models that closely approximate their Mask R-CNN teacher with 7 to 17x lower inference runtime cost (11 to 26x in FLOPs), even when the target video's distribution is non-stationary. Our method requires no offline pretraining on the target video stream, achieves higher accuracy and lower cost than solutions based on flow or video object segmentation, and can exhibit better temporal stability than the original teacher. We also provide a new video dataset for evaluating the efficiency of inference over long running video streams.

Online model distillation for efficient video inference
Ravi Teja Mullapudi, Steven Chen, Keyi Zhang, Deva Ramanan, Kayvon Fatahalian
ICCV 2019

HydraNets: Specialized Dynamic Architectures for Efficient Inference

HydraNets explore semantic specialization as a mechanism for improving the computational efficiency (accuracy-per-unit-cost) of inference in the context of image classification. Specifically, we propose a network architecture template called HydraNet, which enables state-of-the-art architectures for image classification to be transformed into dynamic architectures which exploit conditional execution for efficient inference. HydraNets are wide networks containing distinct components specialized to compute features for visually similar classes, but they retain efficiency by dynamically selecting only a small number of components to evaluate for any one input image. On CIFAR, applying the HydraNet template to the ResNet and DenseNet family of models reduces inference cost by 2-4x while retaining the accuracy of the baseline architectures. On ImageNet, applying the HydraNet template improves accuracy up to 2.5% when compared to an efficient baseline architecture with similar inference cost.

HydraNets: Specialized Dynamic Architectures for Efficient Inference
Ravi Teja Mullapudi, William R.Mark, Noam Shazeer, Kayvon Fatahalian
CVPR 2018

Automatic scheduling of Halide programs

The Halide image processing language has proven to be an effective system for authoring high-performance image processing code. Halide programmers need only provide a high-level strategy for mapping an image processing pipeline to a parallel machine (a schedule), and the Halide compiler carries out the mechanical task of generating platform-specific code that implements the schedule. Unfortunately, designing high-performance schedules for complex image processing pipelines requires substantial knowledge of modern hardware architecture and code-optimization techniques. In this paper we provide an algorithm for automatically generating high-performance schedules for Halide programs. Our solution extends the function bounds analysis already present in the Halide compiler to automatically perform locality and parallelism-enhancing global program transformations typical of those employed by expert Halide developers. The algorithm does not require costly (and often impractical) auto-tuning, and, in seconds, generates schedules for a broad set of image processing benchmarks that are performance-competitive with, and often better than, schedules manually authored by expert Halide developers on server and mobile CPUs, as well as GPUs.

Automatically Scheduling Halide Image Processing Pipelines
Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, Kayvon Fatahalian

Automatic Optimization for Image Processing Pipelines

Image processing pipelines are ubiquitous and demand high-performance implementations on modern architectures. Manually implementing high performance pipelines is tedious, error prone and not portable. For my masters thesis, I focused on the problem of automatically generating efficient multi-core implementations of image processing pipelines from a high-level description of the pipeline algorithm. I leveraged polyhedral representation and code generation techniques to achieve this goal. PolyMage is a domain-specific system built for evaluating and experimenting with techniques developed during the course of my masters.

PolyMage: Automatic Optimization for Image Processing Pipelines
Ravi Teja Mullapudi, Vinay Vasista, Uday Bondhugula
Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015

Compiling Affine Loop Nests for Dataflow Runtimes

Designed and evaluated a compiler and runtime to automatically extract coarse-grained dataflow parallelism in affine loop nests to target shared and distributed memory systems. As part of the evaluation, we implemented a set of benchmarks using the CnC (Intel Concurrent Collections) programming model to serve as a comparision to our system. Implementation of the Floyd-Warshall All-Pairs-Shortest-Paths algorithm used in the evaluation is now part of Intel CnC samples.

website template stolen from here