Big Data

I'm very interested in algorithms and computer systems for efficient big data analytics, and their applications to online social and media-sharing networks. To that end, I'm working full-time at Tubular Labs to index and offer analytics on all video content ever posted online, empowering artists, studios, multi-channel networks, as well as popular brands and tech giants, to take intelligent decisions on the rough and ever-changing terrain of online video. In my spare time, and prior to joining Tubular, I've been collaborating with Prof. Victor Preciado and his group from the University of Pennsylvania on a few like-minded research projects.

At Tubular, where I am supported by and working with a wonderful team of people, my main contributions pertain to the backend processes that populate the dashboard shown in the accompanying pictures. Users can specify and track a list of up to ten million videos, and subsequently, they expect to see how this group evolves over time. I created separate tools to help debug existing issues when I joined, and over the following months I cleaned up and re-architected the main job to be nicely abstracted and easy to modify, extended it to handle multiple platforms (initially it was Youtube only), and optimized it to process over 80% of the customer load in three to four hours through a careful analysis of usecases and a reduction of the underlying algorithmic problem to bin packing. Here, I've become familiar with the very interesting technologies that comprise our stack, such as ElasticSearch, Impala, HDFS, Cassandra, Tornado and REDIS. Not to mention the joy of working on so rich datasets!

Before joining Tubular, I was working with Prof. Victor Preciado on evolving social networks, as part of my Ph.D. studies and my research fellowship at the University of Pennsylvania. Our main focus was on mapping global spectral properties of large-scale social networks that evolve over time, such as Facebook, Wikipedia interactions, etc., to local structural properties, and then predict evolution based on this mapping. Among other responsibilities, I researched, designed and implemented a fast, scalable tool for analysis and tracking of structural properties in C/C++, and developed simulations based on evolution models proposed in the literature and compared their synthetic data to actual data to identify points of failure. We then came up with a novel evolution model, as well as user and community similarity measures. A preliminary version of this research, enriched with more algorithmic directions by Prof. Sanjeev Khanna, resulted in NSF BIGDATA grant #1447470 ($500,000). I am now collaborating with Victor and his group on a few cool, new projects that are still on stealth mode!