Research/Projects

@Microsoft:
I have been working as a Data Scientist in the Windows Core Data Science team since July 2014. Core Data Science is a new team in the Windows & Devices Group in Microsoft, which uses data to better inform engineering decisions. Can't really give more details without possibly violating NDA stuff :-)

The good thing about life @ Microsoft is that (apart from enjoying the numerous hikes in WA), I'm also getting the opportunity to work on fun side projects/hacks. Here are two things I worked on recently:
1. An automatic slouch (bad posture detector) for the Garage Science Fair which won the Golden Volcano Award. It uses a Kinect to detect bad postures and automatically changes the color of your screen to mild->dark red (depending on how much your posture suck) to warn you. Here is a video of it in action and here is what we got for working on it! :D
2. Over an exceptionally gloomy long weekend (the few times Seattle weather becomes super wet and depressing), I hacked together an imitation of "Yelp's review highlights" using publicly available Windows app reviews as well as a few open source tools (D3.js, Twitter Bootstrap and NLTK). This was more an exercise to forestall boredom than anything. You can check out the results here: https://dl.dropboxusercontent.com/u/5360659/Sites/index.html


Publications:
1. Our paper on "Generating Natural-Language Video Descriptions Using Text-Mined Knowledge" got accepted in the Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI-2013), July 2013
Video about the approach & with examples: http://www.youtube.com/watch?v=hsna5vGyXYI
Link: http://www.cs.utexas.edu/users/ai-lab/pub-view.php?PubID=127273

2. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge 2013 Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, Sergio Guadarrama, Proceedings of the NAACL HLT Workshop on Vision and Language (WVL '13) (2013), pp. 10--19.

3. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition 2013 Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, Kate Saenko, In Proceedings of the 14th International Conference on Computer Vision (ICCV-2013), pp. 2712--2719, Sydney, Australia, December 2013.


FALL 2012


1) Generating natural language descriptions for YouTube videos
We proposed a holistic data-driven technique that generates natural language descriptions for videos. We combined the scores from state-of-the-art object and activity detection algorithms with “real-world” knowledge to select the most probable subject-verb-object triplet to describe the video. We showed that this knowledge, mined from large web-scale corpora can enhance the triplet selection algorithm by providing it contextual information. We are able to generate more relevant descriptions for videos by using this technique.
Skills: Java, MATLAB, Bash scripting, Perl scripts
Tools: WordNet::Similarity package, Weka, LIBSVM, Stanford dependency parser, SRI LM, Berkeley LM, SimpleNLG, Amazon MTurk
Course: LIN 386M Learning Grounded Models of Meaning
Instructors: Prof. Katrin Erk and Prof. Jason Baldridge
Link to Project Report (PDF)

2) Identifying Subject-Verb-Object triplets in videos
We focused on improving the accuracy of subject+verb+object triplets in realistic YouTube videos by using "common sense" knowledge mined from web-scale text corpora. We also showed that expanding the top activity detections with similar verbs can further improve the identification of the right triplet. This also allows us to detect activities for which there is little or no training data (similar to zero-shot learning). We were able to achieve a dramatic improvement in verb accuracy from 8% to 36% while also substantially improving on the subject and object accuracies.
Skills: Java, MATLAB, Bash scripting, Perl scripts
Tools: WordNet::Similarity package, Weka, LIBSVM, Stanford dependency parser, SRI LM, Berkeley LM, SimpleNLG, Amazon MTurk
Course: CS395T Visual Recognition
Instructor: Prof. Kristen Grauman
Link to Project Report (PDF)
Link to Project Presentation: PDF(without videos) & PPT (with videos)

3) Research done as GRA with Prof. Mooney
I am working on the DARPA funded Mind's Eye project in collaboration with other researchers from UT Austin, UC Berkeley and University of Massachusetts Lowell. This project deals with describing activities occurring in various surveillance videos.


SUMMER 2012

During the summer of 2012, I interned at Oblong, a startup in Los Angeles. The founder of Oblong, John Underkoffler, had designed the computer interfaces in the film Minority Report and the people at Oblong actually worked on creating similar spatial, networked, multi-user, multi-screen, multi-device computing environments!
Project Goal: Developing a real time hand pose estimation system based on 3D depth sensors
Mentor: Dr. David Minnen who was the Director of Computer Vision and headed the R&D team at Oblong at that time.
Methodology: After spending a month in literature survey and in trying out different ideas, Dr. Minnen and I decided on using a supervised binary hashing based approach for implementing a database retrieval style approach for pose estimation using simple nearest neighbors. In order to obtain accurate real time performance, we had to overcome many challenges and the final solution involved implementation of cutting edge ideas from 3 different papers from CVPR 2012, which took place a few weeks after my internship started! These ideas ranged from the state of the art methods for locality sensitive hashing to fast nearest neighbor search by non-linear embedding and fast search in hamming space with multi index hashing. The whole experience of doing literature survey on a previously completely unknown topic, identifying the best ideas from the latest publications in that field and combining all these ideas to deliver a commercial product, all in the span of 12 weeks was mind-blowing.
Skills: C++, C, Octave, bash scripting
Link to Project Presentation: PDF


SPRING 2012

1) Sequence Labeling Using Parallel CRFs
By exploiting the hierarchical information contained in the part of speech (POS) tags, we developed a faster approach for the POS sequence labeling problem by using several Conditional Random Fields (CRFs) in parallel. We were able to greatly reduce execution time and yet obtain accuracy better than the baseline CRF approach.
Course: CS 388: Natural Language Processing
Instructor: Prof. Raymond Mooney
Link to project report: PDF

2) Natural Language Specifications for Temporal Logic
Formal verification is a well recognized technique in software and hardware system design. However, a major barrier against using formal verification techniques is the requirement of learning and using a very precise logical formulation. While there are many non-logicians such as hardware/software engineers, who might benefit from using verification techniques such as model checking or theorem proving, they might not be able to frame their ideas about which properties should be checked, in the corresponding formal language that the model checking software requires. This situation serves as a motivation towards developing an interface to a model checking software that can take in specifications expressed in natural language and convert these specifications to temporal logic formulae, since allowing such a natural language interface will increase the usability of model checking tools. In this paper, I discuss the various approaches and advances that have been made in this area of natural language specification for conversion to/from temporal logic formulae.
Course: 
CS388S Formal Verification and Semantics
Instructor: Prof. Emerson
Link to project report: PDF


FALL 2011

1) Ping Pong using Artificial Intelligence
Designed a 2D as well as a 3D implementation of a ping-pong game using AI techniques from neural networks and genetic programming, wherein neural network bots learn and improve their skills while playing against each other or against a human player.
Check out the cool performance videos here!
Course: CS 394 Neural Networks
Instructor: Dr. Miikkulainen (risto@cs.utexas.edu)
Project Report link

2) PENSIEVE: Using tag clouds and graphs for visualizing large text corpora
Designed a visualization tool, which creates a graph-based summary of textual data such as novels by using NLP techniques, and depicts trends in the data via sentiment analysis.
Course: CS 395T Concepts of Information Retrieval
Instructor: Dr. Lease (ml@ischool.utexas.edu)
Project Report link
Project PPT link

Work done as GRA in TACC during FALL 2011 and SPRING 2012
Mentors: Brandt Westing and Karla Vega
I worked as a GRA in the visualization lab of Texas Advanced Computing Center where I got the opportunity to play around with the world's highest resolution display & touchscreen. During the 2 semesters, I worked on the below projects.
1) Visualization of academic paper keywords using Processing
2) Twitter data visualization based on professional groups (with Dr. George Veletsianos)
3) Visualizing objects at different scales (with Dr. Cesar Delgado)
4) Gesture recognition using Kinect


ORACLE India August 2010-August 2011

I worked as a Software developer in Oracle India Private Limited, Bangalore from Aug 2010 to Aug 2011. During this one year, I worked on projects related to improving parallelization of various operations for the 12c release of the Oracle database.
Manager: Huyen Nguyen (huyen.nguyen@oracle.com)


Projects done during my undergrad in BITS Pilani (2006-2010)

Course-work projects
1) Studied & implemented a robust automatic speech recognition system (Speech to text conversion) in presence of noise by using Hidden Markov Models for pattern recognition by using C and Perl to modify the HTK kit.
2) Designed a restricted vocabulary speech synthesis system in MATLAB to give a demonstration of the various constraints involved in speech synthesis.
3) Implemented an automatic cartoon generator and animator which accepts a digital image as input and produces a caricature and further animates the output using OpenGL/Java.
 
Practice School-2 station & project: January 2010-June 2010
INSEAD Business School, Fontainebleau, France
I worked with Prof. Hilke Plassmann in the Marketing Department in INSEAD. My work involved creation of a website and designing psychological experiments in Eprime as well as scripting in Matlab and VBA for data analysis.
Areas involved: Neuro-economics, MATLAB, E-Prime, SPM, Joomla

DAAD WISE 2009 Research Fellowship: Summer of 2009
I did a project on the production of an animated talking head at the Technical University, Berlin from May 2009-August 2009 under the German Academic Exchange program in which the end result was the conversion of the input text into a video of the speaker with facial feature synchronization. Click here to read about my internship experience and check out my mentor, Dr. Sascha Fagel's site here

Practice School-1 station & project: Summer of 2008
Dlink India Limited, Goa (http://www.dlink.co.in/)
I worked in the R&D department on a project which dealt with configuration & implementation of a Version Control System on the Dlink computer network and also an analysis of network security and vulnerabilities.
Areas involved: CVS, network security-vulnerabilities and exploitation