    Bootstrapped Graph Diffusions: Exposing the Power of Nonlinearity

    Eliav Buchnik, Edith Cohen

    ACM Sigmetrics (2018) (to appear)


    Designing A/B tests in a collaboration network

    Sangho Yoon

    Google Data Science blog (2018)


    HARP: Hierarchical Representation Learning for Networks

    Haochen Chen, Bryan Perozzi, Yifan Hu, Steven Skiena

    AAAI'18 (2018) (to appear)


    Hidden in Plain Sight: Classifying Emails Using Embedded Image Contents

    Navneet Potti, James B. Wendt, Qi Zhao, Sandeep Tata, Marc Najork

    The Web Conference (2018) (to appear)


    Optimal Dynamic Strings

    Adam Karczmarz, Jakub Łącki, Paweł Gawrychowski, Piotr Sankowski, Tomasz Kociumaka

    SODA 2018 (to appear)


    Orienteering Algorithms for Generating Travel Itineraries

    Zachary Friggstad, Sreenivas Gollapudi, Kostas Kollias, Tamas Sarlos, Chaitanya Swamy, Andrew Tomkins

    International Conference on Web Search and Data Mining (WSDM), ACM (2018)


    A Generic Coordinate Descent Framework for Learning from Implicit Feedback

    Immanuel Bayer, Xiangnan He, Bhargav Kanagal, Steffen Rendle

    Proceedings of the 26th International Conference on World Wide Web (2017), pp. 1341-1350


    A Neural Architecture for Dialectal Arabic Segmentation

    Younes Samih, Mohammed Attia, Mohamed Eldesouki, Hamdy Mubarak, Ahmed Abdelali, Laura Kallmeyer, Kareem Darwish

    The Third Arabic Natural Language Processing Workshop (WANLP), Valencia, Spain (2017), pp. 46-54


    Beyond Globally Optimal: Focused Learning for Improved Recommendations

    Alex Beutel, Ed H. Chi, Zhiyuan Cheng, Hubert Pham, John Anderson

    Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017


    Crafting a lexicon of referential expressions for NLG applications

    Ariel Gutman, Alexandros Chaaraoui, Pascal Fleury

    The 2017 Israeli Seminar of Computational Linguistics, Rachel and Selim Benin School of Computer Science and Engineering, Edmond J. Safra Campus, Jerusalem (2017)


    Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters

    Alessandro Epasto, Silvio Lattanzi, Renato Paes Leme

    KDD '17 (2017)


    Email Category Prediction

    Aston Zhang, Luis Garcia Pueyo, James B. Wendt, Marc Najork, Andrei Broder

    Companion Proc. of the 26th International World Wide Web Conference (2017), pp. 495-503


    HyperLogLog Hyper Extended: Sketches for Concave Sublinear Frequency Statistics

    Edith Cohen

    KDD (2017) (to appear)


    Instance-Level Label Propagation with Multi-Instance Learning

    Qifan Wang, Gal Chechik, Chen Sun, Bin Shen

    IJCAI (2017) (to appear)


    Latent LSTM Allocation: Joint clustering and non-linear dynamic modeling of sequence data

    Manzil Zaheer, Amr Ahmed, Alexander Smola

    WSDM, ACM (2017) (to appear)


    Local Topic Discovery via Boosted Ensemble of Nonnegative Matrix Factorization

    Sangho Suh, Jaegul Choo, Joonseok Lee, Chandan K. Reddy

    Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Sister conferences track (2017)


    Related Event Discovery

    Cheng Li, Mike Bendersky, Sujith Ravi, Vijay Garg

    Proceedings of WSDM (2017)


    SHRec: Scalable Holistic Recommendation

    Ahmed Aly, Amr Ahmed, Moustafa Hammad

    International Conference on Scientific and Statistical Database Management (2017)


    Submodular Optimization Over Sliding Windows

    Alessandro Epasto, Morteza Zadimoghaddam, Sergei Vassilvitskii, Silvio Lattanzi

    Proceedings of the 26th International World Wide Web Conference, WWW (2017)


    Template Induction over Unstructured Email Corpora

    Julia Proskurnia, Marc-Allen Cartright, Lluís Garcia-Pueyo, Ivo Krka, James B. Wendt, Tobias Kaufmann, Balint Miklos

    Proc. of the 26th International World Wide Web Conference (2017), pp. 1521-1530


    The Spread of Physical Activity Through Social Networks

    Alessandro Epasto

    Proceedings of the 26th International World Wide Web Conference 2017, WWW


    A New Approach to Optimal Code Formatting

    Phillip Yelland

    Google, Inc. (2016)


    A Simple and Efficient Method to Handle Sparse Preference Data Using Domination Graphs: An Application to YouTube

    Shumeet Baluja

    ICCS 2016, 2302–2311


    Deep Neural Networks for YouTube Recommendations

    Paul Covington, Jay Adams, Emre Sargin

    Proceedings of the 10th ACM Conference on Recommender Systems, ACM, New York, NY, USA (2016) (to appear)


    Discovering Structure in the Universe of Attribute Names

    Alon Halevy, Natalya Fridman Noy, Sunita Sarawagi, Steven Euijong Whang, Xiao Yu

    Proc. 25th International World Wide Web Conference (2016)


    Ego-net Community Mining Applied to Friend Suggestion

    Alessandro Epasto, Silvio Lattanzi, Vahab S. Mirrokni, Ismail Sebe, Ahmed Taei, Sunita Verma

    Proceedings of VLDB (2016)


    From Freebase to Wikidata: The Great Migration

    Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, Thomas Steiner, Lydia Pintscher

    World Wide Web Conference, ACM (2016)


    Hierarchical Label Propagation and Discovery for Machine Generated Email

    James B. Wendt, Michael Bendersky, Lluis Garcia-Pueyo, Vanja Josifovski, Balint Miklos, Ivo Krka, Amitabh Saikia, Jie Yang, Marc-Allen Cartright, Sujith Ravi

    Proceedings of the International Conference on Web Search and Data Mining (WSDM), ACM (2016), pp. 317-326


    L-EnsNMF: Boosted Local Topic Discovery via Ensemble of Nonnegative Matrix Factorization

    Sangho Suh, Jaegul Choo, Joonseok Lee, Chandan K. Reddy

    Proceedings of the IEEE International Conference on Data Mining (ICDM) (2016)


    LLORMA: Local Low-Rank Matrix Approximation

    Joonseok Lee, Seungyeon Kim, Guy Lebanon, Yoram Singer, Samy Bengio

    Journal of Machine Learning Research (JMLR), vol. 17 (2016), pp. 1-24


    Learning mobile phone battery consumptions

    Andres Munoz Medina, Ashish Sharma, Felix Yu, Paul Eastham, Sergei Vassilvitskii, Umar Syed

    Workshop on On Device Intelligence (2016)


    Linking Users Across Domains with Location Data: Theory and Validation

    Chistopher Riederer, Yunsung Kim, Nitish Korula, Silvio Lattanzi, Augustin Chaintreau

    WWW (2016) (to appear)


    M3A: Model, MetaModel, and Anomaly Detection in Web Searches

    Da-Cheng Juan, Neil Shah, Mingyu Tang, Zhiliang Qian, Diana Marculescu, Christos Faloutsos

    arXiv preprint arXiv:1606.05978 (2016)


    On Sampling Nodes in a Network

    Flavio Chierichetti, Anirban Dasgupta, Ravi Kumar, Silvio Lattanzi, Tamas Sarlos

    WWW (2016) (to appear)


    Open and Closed Schema for Aligning Knowledge and Text Collections.

    Matthew Kelcey

    Workshop on Exploiting Semantic Annotations for Information Retrieval (ESAIR) (2016)


    Reverse Ranking by Graph Structure: Model and Scalable Algorithms

    Edith Cohen, Eliav Buchnik

    ACM SIGMETRICS 2016 (to appear)


    TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size

    Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, Eli Upfal

    ACM SIGKDD (2016) (to appear)


    The Limits of Popularity-Based Recommendations, and the Role of Social Ties

    Marco Bressan, Stefano Leucci, Alessandro Panconesi, Prabhakar Raghavan, Erisa Terolli

    Proceedings of ACM KDD 2016, ACM


    When Recommendation Goes Wrong - Anomalous Link Discovery in Recommendation Networks

    Bryan Perozzi, Michael Schueppert, Jack Saalweachter, Mayur Thakur

    Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)


    Active Learning in Keyword Search-based Data Integration

    Zhepeng Yan, Nan Zheng, Zachary G. Ives, Partha Pratim Talukdar, Cong Yu

    The VLDB Journal, vol. 24 (2015), pp. 611-631


    Applying WebTables in Practice

    Sreeram Balakrishnan, Alon Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, Cong Yu

    Conference on Innovative Data Systems Research (2015)


    Associating Locations with Healthcare Events

    Daniel V. Klein, Dean Jackson

    Defensive Publications Series, Technical Disclosure Commons (2015)


    Automatic Pronunciation Verification for Speech Recognition

    Kanishka Rao, Fuchun Peng, Françoise Beaufays

    ICASSP (2015)


    Crowdsourcing and the Semantic Web: A Research Manifesto

    Cristina Sarasua, Elena Simperl, Natasha Noy, Abraham Bernstein, Jan Marco Leimeister

    Human Computation, vol. 2 (2015)


    Discovering Subsumption Relationships for Web-Based Ontologies

    Dana Movshovitz-Attias, Steven Euijong Whang, Natalya Noy, Alon Halevy

    Proc. 18th International Workshop on the Web and Databases (WebDB) (2015)


    Distributed Graph Algorithmics: Theory and Practice

    Silvio Lattanzi, Vahab S. Mirrokni

    WSDM (2015), pp. 419-420


    Efficient Algorithms for Public-Private Social Networks

    Flavio Chierichetti, Alessandro Epasto, Ravi Kumar, Silvio Lattanzi, Vahab Mirrokni

    KDD (2015)


    Efficient Densest Subgraph Computation in Evolving Graphs

    Alessandro Epasto, Silvio Lattanzi, Mauro Sozio

    WWW (2015)


    Event Relevant Reminders

    Daniel V. Klein, Dean Jackson

    Defensive Publications Series, Technical Disclosure Commons (2015)


    Fix It Where It Fails: Pronunciation Learning by Mining Error Corrections from Speech Logs

    Zhenzhen Kou, Daisy Stanton, Fuchun Peng, Françoise Beaufays, Trevor Strohman

    ICASSP (2015)


    Focus on the Long-Term: It's better for Users and Business

    Henning Hohnhold, Deirdre O'Brien, Diane Tang

    Proceedings 21st Conference on Knowledge Discovery and Data Mining, ACM, Sydney, Australia (2015)


    Improving User Topic Interest Profiles by Behavior Factorization

    Zhe Zhao, Zhiyuan Cheng, Lichan Hong, Ed H. Chi

    Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2015), pp. 1406-1416


    Linked Enterprise Data Model and Its Use in Real Time Analytics and Context-Driven Data Discovery

    KUNAL TANEJA, Qian Zhu, Desmond Duggan, Teresa Tung

    IEEE International Conference on Mobile Services, 1800 (2015), pp. 277-283 (to appear)


    Mining Subjective Properties on the Web

    Immanuel Trummer, Alon Halevy, Hongrae Lee, Sunita Sarawagi, Rahul Gupta

    SIGMOD (2015) (to appear)


    Multi-Objective Weighted Sampling

    Edith Cohen

    HotWeb 2015 (to appear)


    Scalable Community Discovery from Multi-Faceted Graphs

    Ahmed Metwally, Jia-Yu Pan, Minh Doan, Christos Faloutsos

    2015 IEEE International Conference on Big Data, IEEE, 445 Hoes Lane Piscataway, NJ 08854-4141 USA (to appear)


    Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google

    Joseph Bonneau, Elie Bursztein, Ilan Caron, Rob Jackson, Mike Williamson

    WWW'15 - Proceedings of the 22nd international conference on World Wide Web, ACM (2015)


    Temporal/Spatial Calendar Events and Triggers

    Daniel V. Klein, Dean Jackson

    Defensive Publications Series, Technical Disclosure Commons (2015)


    Unified and contrasting cuts in multiple graphs: application to medical imaging segmentation

    Chia-Tung Kuo, Xiang Wang, Peter Walker, Owen Carmichael, Jieping Ye, Ian Davidson

    KDD (2015), pp. 617-626


    What can be Found on the Web and How: A Characterization of Web Browsing Patterns

    Alexey Tikhonov, Arseniy Chelnokov, Gleb Gusev, Ivan Bogatyy, Liudmila Ostroumova Prokhorenkova

    WebSci 2015, Oxford (to appear)


    Biperpedia: An Ontology for Search Applications

    Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu

    Proc. 40th Int'l Conf. on Very Large Data Bases (PVLDB) (2014)


    Distributed Balanced Clustering via Mapping Coresets

    Mohammadhossein Bateni, Aditya Bhaskara, Silvio Lattanzi, Vahab Mirrokni

    NIPS, Neural Information Processing Systems Foundation (2014)


    Frame by Frame Language Identification in Short Utterances using Deep Neural Networks

    Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Pedro J. Moreno, Joaquin Gonzalez-Rodriguez

    Neural Networks Special Issue: Neural Network Learning in Big Data (2014)


    Great Question! Question Quality in Community Q&A

    Sujith Ravi, Bo Pang, Vibhor Rastogi, Ravi Kumar

    International AAAI Conference on Weblogs and Social Media (ICWSM) (2014)


    Handcrafted Fraud and Extortion: Manual Account Hijacking in the Wild

    Elie Bursztein, Borbala Benko, Daniel Margolis, Tadek Pietraszek, Andy Archer, Allan Aquino, Andreas Pitsillidis, Stefan Savage

    IMC '14 Proceedings of the 2014 Conference on Internet Measurement Conference, ACM, 1600 Amphitheatre Parkway, pp. 347-358


    Knowledge Base Completion via Search-Based Question Answering

    Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, Dekang Lin

    WWW (2014)


    Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion

    Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, Wei Zhang

    The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24 - 27, 2014, pp. 601-610


    Near Neighbor Join

    Herald Kllapi, Boulos Harb, Cong Yu

    ICDE (2014)


    On Estimating the Average Degree

    Anirban Dasgupta, Ravi Kumar, Tamas Sarlos

    23rd International World Wide Web Conference, WWW '14, ACM (2014) (to appear)


    Quizz: Targeted Crowdsourcing with a Billion (Potential) Users

    Panos Ipeirotis, Evgeniy Gabrilovich

    WWW (2014) (to appear)


    RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response

    Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova

    Proceedings of the 21st ACM Conference on Computer and Communications Security, ACM, Scottsdale, Arizona (2014)


    Reducing the Sampling Complexity of Topic Models

    Aaron Li, Amr Ahmed, Sujith Ravi, Alexander J Smola

    ACM Conference on Knowledge Discovery and Data Mining (KDD) (2014)


    Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising

    Amr Ahmed, Abhimanyu Das, Alexander J. Smola

    ACM International Conference on Web Search And Data Mining (WSDM) (2014)


    Taxonomy Discovery for Personalized Recommendation

    Yuchen Zhang, Amr Ahmed, Vanja Josifovski, Alexander J Smola

    ACM International Conference on Web Search And Data Mining (WSDM) (2014)


    Trust, but Verify: Predicting Contribution Quality for Knowledge Base Construction and Curation

    Chun How Tan, Eugene Agichtein, Panos Ipeirotis, Evgeniy Gabrilovich

    WSDM (2014) (to appear)


    Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest Modeling

    Liang Zhao, Feng Cheng, Jing Dai, Ting Hua, Chang-Tien Lu, Naren Ramakrishnan

    PLOS ONE, vol. 9 (2014), pp. 1-12


    A Framework for Benchmarking Entity-Annotation Systems

    Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita

    Proceedings of the International World Wide Web Conference (WWW) (Practice & Experience Track), ACM (2013)


    Classifying YouTube Channels: a Practical System

    Vincent Simonet

    Proceedings of the 2nd International Workshop on Web of Linked Entities (WOLE 2013), in Proceedings of the 22nd International conference on World Wide Web companion, ACM, pp. 1295-1304


    Compacting Large and Loose Communities

    Chandrashekhar V., Shailesh Kumar, C. V. Jawahar

    Asian Conference on Pattern Recognition (2013) (to appear)


    Crawling deep web entity pages

    Yeye He, Dong Xin, Venkatesh Ganti, Sriram Rajaraman, Nirav Shah

    WSDM (2013), pp. 355-364


    Crowd-Sourced Call Identification and Suppression

    Daniel V. Klein, Dean K. Jackson

    Federal Trade Commission Robocall Challenge (2013)


    Data Fusion: Resolving Conflicts from Multiple Sources

    Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava

    WAIM (2013), pp. 64-76 (to appear)


    Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real-time Story Identification

    Albert Angel, Nick Koudas, Nikos Sarkas, Divesh Srivastava, Michael Svendsen, Srikanta Tirthapura

    The VLDB Journal (2013), pp. 1-25


    Distributed Large-scale Natural Graph Factorization

    Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy,, Vanja Josifovski, Alexander J Smola

    Proceedings of the 22nd International World Wide Web Conference (WWW 2013) (to appear)


    Diversity maximization under matroid constraints

    Zeinab Abbassi, Vahab Mirrokni, Mayur Thakur

    KDD, ACM SIGKDD (2013), pp. 32-40


    Efficient and Accurate Label Propagation on Large Graphs and Label Sets

    Michele Covell, Shumeet Baluja

    Proceedings International Conference on Advances in Multimedia, IARIA (2013)


    Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

    Wei Xu, Raphael Hoffmann, Le Zhao, Ralph Grishman

    ACL 2013


    Focused Marix Factorization for Audience Selection in Display Advertising

    Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo, Jeff Yuan

    Proceedings of the 29th International Conference on Data Engineering (ICDE) (2013)


    From Assets to Stories via the Google Cultural Institute Platform

    W. Brent Seales, Steve Crossan, Sertan Girgin, Mark Yoshitake

    IEEE BigData'13 Big Data and the Humanities (2013), pp. 6 (to appear)



    Patrick Copeland, Raquel Romano, Tom Zhang, Greg Hecht, Dan Zigmond, Christian Stefansen

    International Society of Neglected Tropical Diseases 2013, International Society of Neglected Tropical Diseases, pp. 3


    Identifying Surrogate Geographic Research Regions with Advanced Exact Test Statistics

    Steven Ellis

    American Marketing Association Advanced Research Techniques Forum (2013), Poster

  • Contact information

    I am SOUMEN CHAKRABARTI, anagram for ANARCHISM OUTBREAK, a faculty member in the Department of Computer Science.

    If you are from industry looking for consultation, please read the section titled Consultative practice rules and norms (1996)herein, and my informal notes.

    If you are looking to join CSE@IITB as a PhD scholar, please read about the PhD Qualifier model being adopted by the department, and contact the department office directly. PhD admissions is centrally coordinated at the department level.

    I do not offer short term projects or summer internships to students not enrolled at IIT Bombay. Such emails will be discarded.

    If you are an IIT student looking for a project or seminar within the scope of your program (Btech, DD, Mtech) please read these guidelines first. You can check my calendar for free slots and, if you have permission, propose a meeting here or by email.

    The best way to contact me is to send mail to (please note that I am on a low-spam diet). Please use only email to initiate a conversation with me if we haven't communicated before. Only in case of an emergency, you can call me at +91-22-2576-7716 or fax me at +91-22-2572-0022. If you are visiting, here are directions to my office.

    Education and career

    • Don Bosco School, Park Circus, Calcutta, 1975--1987
    • Indian Institute of Technology, Kharagpur, 1987--1991
    • University of California, Berkeley, 1991--1996
    • IBM Almaden Research Center, 1996--1999
    • IIT Bombay, 1999--present
    • Carnegie-Mellon University, Spring 2004

    Research interests

    Searching the annotated Web with entities, types and relations
    We are building CSAW, a new search system that integrates type and role annotations with keyword matches, thereby exploiting lexical ontologies and entity taggers. Supported by Yahoo!, HP Labs, Google, Microsoft, SAP and NetApp.
    Graph conductance search
    Rich connections between random walks, graph eigensystems, and electrical networks make it attractive to apply them for ranking nodes. PageRank is a prominent example of the paradigm. In PageRank, the edge weights are fixed and we have to compute steady state probabilities of nodes. What if we have something like the opposite problem? And how to make this fast at query time? Supported by IBM and Microsoft (2007, 2008).
    Integrating IR with databases
    In the BANKS project, we proposed new paradigms of keyword search in graphs that can represent text embedded in relational or XML-like data.
    The effect of search engines on the Web graph and page popularity
    Search engines are influenced by the (in)degree of Web pages, but their ranked lists modulate page popularity and eventually their (in)degree, setting up a feedback to some degree. Might the evolution of the Web graph be influenced substantially by the existence of search engines? Is there a need to regulate monopolies? What are healthy economic objectives, and how to optimize them?
    Focused crawlers to build topic-specific portals
    A focused crawler collects a topic-specific subgraph of the Web by coupling classifiers and reinforcement learners with crawlers. An open-source focused crawler project was started at the Lab. for Intelligent Internet Research and is available.
    Mining hypertext to estimate topics and popularity
    I built a hypertext classifier that uses the text in and links around a given Web page to label it with a topic. This was an early application of Markov networks to Web analysis. As a member of the IBM Clever Project, I worked on algorithms to analyze the links around a web page and the text in pages that cite the given page to assign it a measure of popularity.
    Compiling and running parallel scientific programs
    In a previous life, my PhD thesis was on the design and implementation of compilers and runtime systems for distributed memory multiprocessors. Seems like distributed parallel computing is hot again, thanks to "Big Data"!

    Professional activity

    Journal editorship
    Conference organization
    • WWW 2017, poster track co-chair with Mounia Lalmas and Wei Chen.
    • CIKM 2014, area char for text and Web data mining.
    • EMNLP 2013, area chair for information retrieval and question answering.
    • WWW 2013, track chair for search, systems and applications.
    • SIGIR 2011, area chair for Web IR and social media search.
    • WWW 2010, program co-chair with Juliana Freire.
    • SIGIR 2010, senior PC member.
    • Web Search APIs: The Next Generation --- A panel discussion at WWW 2009. Panel slides.
    • SIGIR 2009, Area Chair, Machine Learning for IR.
    • WSDM 2008 ("wisdom"), Program Co-chair with Andrei Broder.
    • VLDB 2007, Tutorial Co-Chair.
    • ECML-PKDD 2006, Area Chair, Track for mining links, graphs, trees and high-dimensional data.
    • WWW 2006, Deputy Chair, Data Mining track.
    • COMAD 2005b, Associate Program Chair.
    • WWW 2003, Vice Chair, Searching and Mining track.
    • ICDE 2003. Vice Chair, Data, Text and Web Mining track.
    • WWW 2002, Deputy Chair, Searching, Querying and Indexing track (CFP).
    Conference committee/reviewing
    ICML 2018, NAACL 2018, WSDM 2018 (test of time awards), SIGIR 2017 (awards), SIGKDD 2017 (awards), WSDM 2017 (awards), NIPS 2017, ACL 2017; NIPS 2016, SIGIR 2016; CIKM 2014, ISWC 2014, SIGIR 2014, ACL 2014, WSDM 2014 (senior PC); SIGKDD 2013 (senior PC), WSDM 2013 (senior PC and awards committee); EMNLP 2012, SIGKDD 2012 (senior PC), WWW 2012; NIPS 2011, ICML 2011 (PC and invited applications talks committee), WWW 2011; SIGKDD 2010; NIPS 2009, WWW 2009, WSDM 2009 (senior PC); SIGKDD 2008 (senior PC), SIGIR 2008 (senior PC), WWW 2008; WWW 2007, SIGMOD 2007; SIGKDD 2006 (senior PC); EMNLP/HLT 2005, SIGKDD 2005, WWW 2005 (panel), SIGMOD 2005; SIGKDD 2004, SIGIR 2004, VLDB 2004, WWW 2004, ICDE 2004; SIGIR 2003, SIGKDD 2003, VLDB 2003 (IIS), SODA 2003; SIGIR 2002, ICDE 2002; SIGIR 2001, WWW 2001; WWW 2000; SIGKDD 1999; SIGKDD 1998.
    • Web Search and Data Mining (WSDM) steering committee member, 2008--2013.
    • ACM SIGKDD Curriculum Committee Member.


    But the power of instruction is seldom of much efficacy, except in those happy dispositions where it is almost superfluous.
    ---Edward Gibbon,
    The Decline And Fall Of The Roman Empire
    Volume 1, Chapter 4

    • Web Search and Mining has been expanded to a two-semester sequence, shorthanded WMa (Autumn) and WMb (Spring). WMa retains the old course code, but has been planned from scratch. WMb will be largely about information extraction and integration, and querying over semistructured and graphical data representations. WMa Autumn 2009, WMb Spring 2010, WMa Autumn 2010, WMb Spring 2011, WMa Autumn 2011, WMa Spring 2013, WMa Autumn 2013, WMb Spring 2014, WMa Autumn 2016, WMb Spring 2017, WMa Autumn 2017.
    • Statistical Foundations of Machine Learning: Autumn 2005, Autumn 2006, Autumn 2007, Autumn 2008.
    • Web Search and Mining (earlier called Information Retrieval and Mining for Hypertext and the Web): Spring 2001, Spring 2002, Spring 2003, Spring 2005, Spring 2006 (new improved), Spring 2007, Spring 2008, Spring 2009.
    • Undergraduate Programming Languages, Spring 2000, Autumn 2000, Autumn 2001, Autumn 2002, Autumn 2003, Autumn 2004.
    • Computer programming and utilization aka CS101, Spring 2012.
    • Graduate Software Lab: Autumn 1999, Autumn 2000.

    ... your work is to keep cranking the flywheel that turns the gears
    that spin the belt in the engine of belief that keeps you and your desk in midair
    ---Annie Dillard,
    The Writing Life

    Representative publication DBLP, Google Scholar?

    • Generalizing Across Domains via Cross-Gradient Training. With Shiv Shankar, Vihari Piratla, Siddhartha Chaudhuri, Preethi Jyothi, and Sunita Sarawagi. ICLR 2018.
    • Task-Specific Representation Learning for Web-scale Entity Disambiguation. With Rijula Kar, Susmija Reddy, Sourangshu Bhattacharya and Anirban Dasgupta. AAAI 2018.
    • A Two-Stage Framework for Computing Entity Relatedness in Wikipedia. With Marco Ponza and Paolo Ferragina. CIKM 2017.
    • Relay-Linking Models for Prominence and Obsolescence in Evolving Networks [paper, video]. With Mayank Singh, Rajdeep Sarkar, Pawan Goyal, and Animesh Mukherjee. SIGKDD 2017.
    • Earth Mover Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading. With Sachin Kumar and Shourya Roy. IJCAI 2017.
    • Collective Entity Resolution with Multi-Focal Attention. With Amir Globerson, Nevena Lazic, Amarnag Subramanya, Michael Ringgaard and Fernando Pereira. ACL 2016.
    • Discriminative Link Prediction using Local, Community, and Global Signals. With Abir De, Sourangshu Bhattacharya, Sourav Sarkar and Niloy Ganguly. IEEE TKDE Journal, 2016.
    • Learning a Linear Influence Model Between Actors from Transient Opinion Dynamics. With Abir De, Sourangshu Bhattacharya, Parantapa Bhattacharya, and Niloy Ganguly. CIKM 2014.
    • Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries. With Mandar Joshi and Uma Sawant. EMNLP 2014.
    • Quantity Queries on Web Tables: Annotation, Response and Consensus Models. With Sunita Sarawagi. SIGKDD 2014.
    • Discriminative Link Prediction using Local Links, Node Features and Community Structure. With Abir De and Niloy Ganguly. ICDM 2013.
    • Joint Bootstrapping of Corpus Annotations and Entity Types. With Siddhanth Jain and Hrushikesh Mohapatra. EMNLP 2013.
    • Web-scale Entity Annotation Using MapReduce. With Shashank Gupta and Varun Chandramouli. HiPC 2013.
    • Learning Joint Query Interpretation and Response Ranking. With Uma Sawant. WWW 2013.
    • Compressed Data Structures for Annotated Web Search. With Sasidhar Kasturi, Bharath Balakrishnan, Ganesh Ramakrishnan, and Rohit Saraf. WWW 2012.
    • Diversity in ranking via resistive graph centers. With Avinava Dubey and Chiru Bhattacharyya. SIGKDD 2011. (Source code is available, contact Avinava Dubey for usage details.)
    • SCAD: Collective Discovery of Attribute Values. With Anton Bakalov, Ariel Fuxman, and Partha Talukdar. WWW 2011.
    • Index Design and Query Processing for Graph Conductance Search. With Amit Pathak and Manish Gupta. VLDB Journal, 2010.
    • Annotating and Searching Web Tables Using Entities, Types and Relationships. With Girija Limaye and Sunita Sarawagi. VLDB 2010.
    • Conditional Models for Non-smooth Ranking Loss Functions. With Avinava Dubey, Jinesh Machchhar, and Chiru Bhattacharyya. ICDM 2009, Miami.
    • Learning to rank for quantity consensus queries. With Somnath Banerjee and Ganesh Ramakrishnan. SIGIR 2009, Boston.
    • Collective annotation of Wikipedia entities in Web text. With Sayali Kulkarni, Amit Singh and Ganesh Ramakrishnan. SIGKDD 2009, Paris.
    • Text search enhanced with types and entities. Chapter in Text Mining: Theory, Application, and Visualization, Srivastava and Sahami, eds., 2008.
    • New closed form bounds on the partition function. With Dvijotham Krishnamurthy and Subhasis Chaudhuri. ECML/PKDD 2008, Antwerp. Winner of the best student paper award.
    • Structured Learning for Non-Smooth Ranking Losses. With Rajiv Khanna, Uma Sawant and Chiru Bhattacharyya. SIGKDD 2008, Las Vegas.
    • Learning to rank in vector spaces and social networks. Internet Mathematics, 2008.
    • Focused Web Crawling. Entry in the Encyclopedia of Database Systems, 2008.
    • The influence of search engines on preferential attachment. With Alan Frieze and Juan Vera. Internet Mathematics, volume 3, number 3 (2006--2007), pages 361--381. A preliminary version appeared in SODA 2005.
    • Learning Random Walks to Rank Nodes in Graphs. With Alekh Agarwal. ICML 2007, Oregon.
    • Dynamic Personalized Pagerank in Entity-Relation Graphs. WWW 2007, Banff.
    • Accelerating Newton optimization for log-linear models through feature redundancy. With Arpit Mathur. IEEE ICDM 2006, Hong Kong.
    • Learning parameters in entity-relationship graphs from ranking preferences. With Alekh Agarwal. ECML-PKDD 2006, Berlin.
    • Learning to rank networked entities. With Alekh Agarwal and Sunny Aggarwal. SIGKDD Conference 2006, Philadelphia.
    • Optimizing Scoring Functions and Indexes for Proximity Search in Type-annotated Corpora. With Kriti Puniyani and Sujatha Das. WWW 2006, Edinburgh.
    • Enhanced Answer Type Inference from Questions using Sequential Models. With Vijay Krishnan and Sujatha Das. EMNLP/HLT 2005, Vancouver.
    • Bidirectional Expansion For Keyword Search on Graph Databases. With Varun Kacholia, Shashank Pandit, S. Sudarshan, Rushi Desai and Hrishikesh Karambelkar. VLDB 2005.
    • Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results. With Sandeep Pandey, Sourashis Roy, Chris Olston, and Junghoo Cho. VLDB 2005.
    • Is question answering an acquired skill? With Ganesh Ramakrishnan, Deepa Paranjpe, and Pushpak Bhattacharyya. WWW2004, New York City.
    • Fast and accurate text classification via multiple linear discriminant projections. With Shourya Roy and Mahesh Soundalgekar. VLDB Journal, 12(2), pages 170--185 [conference version, talk slides].
    • Cross-Training: Learning Probabilistic Mappings Between Topics. With Sunita Sarawagi and Shantanu Godbole. SIGKDD Conference 2003, Washington D.C.
    • Monitoring the Dynamic Web to respond to Continuous Queries. With Sandeep Pandey and Krithi Ramamritham. WWW 2003, Budapest, Hungary, May 2003. (talk slides.)
    • Accelerated focused crawling through online relevance feedback. With Kunal Punera and Mallela Subramanyam. WWW 2002, Hawaii. (Local copy.)
    • The structure of broad topics on the Web. With Mukul Joshi, Kunal Punera, and David M. Pennock. WWW 2002, Hawaii. (Local copy.)
    • Keyword Searching and Browsing in Databases using BANKS. With Gaurav Bhalotia, Charuta Nakhe, Arvind Hulgeri, and S. Sudarshan. In ICDE 2002. Also see the BANKS home page. Winner of the ICDE 2012 influential paper award.
    • Enhanced topic distillation using text, markup tags, and hyperlinks. With Mukul M. Joshi and Vivek B. Tawde. In SIGIR 2001 (talk slides).
    • Integrating the Document Object Model with hyperlinks for enhanced topic distillation and information extraction. In the 10th International World Wide Web Conference, Hong Kong, May 2001.
    • Memex: A browsing assistant for collaborative archiving and mining of surf trails. With Sandeep Srivastava, Mallela Subramanyam and Mitul Tiwari. Demo at VLDB 2000.
    • Data mining for hypertext: A tutorial survey. SIGKDD Explorations, 1(2), pages 1--11, 2000.
    • Using Memex to archive and mine community Web browsing experience. With Sandeep Srivastava, Mallela Subramanyam and Mitul Tiwari. In the 9th International World Wide Web Conference, Amsterdam, May 2000. Talk slides. Social bookmarking companies founded long after this paper: HistorySE, Delicious, Digg, StumbleUpon, Reddit, Furl, Simpy, Citeulike, etc.
    • Mining the Web's Link Structure.  With Byron E. Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson, and Jon Kleinberg. In IEEE Computer, vol. 32, no. 8, August 1999 (IEEE copy).
    • Distributed Hypertext Resource Discovery Through Examples. With Martin van den Berg and Byron Dom. VLDB 1999, Edinburgh, Scotland. Talk slides.
    • Hypersearching the Web. With Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, Jon M. Kleinberg, and David Gibson. Invited paper in Scientific American, June 1999.
    • Surfing the Web Backwards. With D. A. Gibson and K. S. McCurley. In WWW 1999.
    • Focused crawling: A new approach to topic-specific Web resource discovery. With M. van den Berg and B. Dom. WWW 1999, Toronto, May 1999. Winner of the best paper award. Also see the project page.

    Upcoming and recent talks and travel

    • Tutorial with Partha Talukdar at CIKM 2017 on Knowledge Extraction and Inference from Text.
    • Keynote talk at CoDS 2017, Chennai, March 2017.
    • Keynote talk at CIKM 2014 Industry Track, Nov 2014.
    • Keynote talk at COMSNETS 2014, Bangalore, Jan 2014.
    • Tutorial on Query Interpretation and Representation for Searching the Web of Objects at WWW 2013, Rio de Janeiro.
    • WWW 2010 Conference, NC, April 2010.
    • Keynote talk at WSDM 2010, NYC, February 2010. [Talk slides.]
    • WWW 2010 PC meeting, Salt Lake City, Utah, January 2010.
    • WWW 2009 tutorial and panel, April 2009.
    • SIGIR 2008 PC meeting, University of Maryland, March 2008.
    • WSDM 2008, Stanford University, February 2008.
    • Tutorial on Learning to rank in vector spaces and social networks at WWW 2007, Banff.
    • Keynote talk at WAW and a short course at Banff, Nov 2006.
    • Invited talk at the International Workshop on Intelligent Information Access, Helsinki, July 2006.
    • Invited talk at the ICML 2005 workshop on Learning in Web Search.
    • Invited talk at the ICML 2005 workshop on Learning and Extending Lexical Ontologies by using Machine Learning Methods.
    • Panel discussion on exploiting dynamic networking effects in Web advertising at WWW 2005.
    • Invited talk and position paper at ECML/PKDD in Pisa, Sept. 2004.
    • Short course on machine learning for hypertext applications at ADFOCS in Saarbrücken, Sept. 2004.
    • Graph structures in data mining. A tutorial presented at SIGKDD 2004 with Christos Faloutsos.
    • Text search for fine-grained semi-structured data. A tutorial presented at VLDB 2002.
    • Beyond hubs and authorities: spreading out and zooming in. Invited talk at ICDT International Workshop on Web Dynamics, London, Jan. 2001.
    • Data Mining and Learning on the Web. NIPS Workshop, Denver, Dec. 2000. By invitation.
    • Nurturing content-based collaborative communities on the Web. Invited talk at the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC), Hong Kong, Oct. 7--8, 2000.
    • Hypertext data mining: A tutorial presented at the SIGKDD Conference, Boston, August 2000.
    • Hypertext databases and hypertext data mining. SIGMOD 1999 Tutorial.


    • Method and system for searching unstructured textual data for quantitative answers to queries.
    • System and method for focussed web crawling.
    • Enhanced hypertext categorization using hyperlinks.
    • System and method for scheduling web servers with a quality-of-service guarantee for each user.
    • Method for interactively creating an information database including preferred information elements, such as, preferred-authority, world wide web pages.
    • Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures.
    • Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values.
    • System and method for mining surprising temporal patterns.
    • Feature diffusion across hyperlinks.

