Bootstrapped Graph Diffusions: Exposing the Power of Nonlinearity
Eliav Buchnik, Edith Cohen
ACM Sigmetrics (2018) (to appear)
Designing A/B tests in a collaboration network
Google Data Science blog (2018)
HARP: Hierarchical Representation Learning for Networks
Haochen Chen, Bryan Perozzi, Yifan Hu, Steven Skiena
AAAI'18 (2018) (to appear)
Hidden in Plain Sight: Classifying Emails Using Embedded Image Contents
Navneet Potti, James B. Wendt, Qi Zhao, Sandeep Tata, Marc Najork
The Web Conference (2018) (to appear)
Optimal Dynamic Strings
Adam Karczmarz, Jakub Łącki, Paweł Gawrychowski, Piotr Sankowski, Tomasz Kociumaka
SODA 2018 (to appear)
Orienteering Algorithms for Generating Travel Itineraries
Zachary Friggstad, Sreenivas Gollapudi, Kostas Kollias, Tamas Sarlos, Chaitanya Swamy, Andrew Tomkins
International Conference on Web Search and Data Mining (WSDM), ACM (2018)
A Generic Coordinate Descent Framework for Learning from Implicit Feedback
Immanuel Bayer, Xiangnan He, Bhargav Kanagal, Steffen Rendle
Proceedings of the 26th International Conference on World Wide Web (2017), pp. 1341-1350
A Neural Architecture for Dialectal Arabic Segmentation
Younes Samih, Mohammed Attia, Mohamed Eldesouki, Hamdy Mubarak, Ahmed Abdelali, Laura Kallmeyer, Kareem Darwish
The Third Arabic Natural Language Processing Workshop (WANLP), Valencia, Spain (2017), pp. 46-54
Beyond Globally Optimal: Focused Learning for Improved Recommendations
Alex Beutel, Ed H. Chi, Zhiyuan Cheng, Hubert Pham, John Anderson
Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017
Crafting a lexicon of referential expressions for NLG applications
Ariel Gutman, Alexandros Chaaraoui, Pascal Fleury
The 2017 Israeli Seminar of Computational Linguistics, Rachel and Selim Benin School of Computer Science and Engineering, Edmond J. Safra Campus, Jerusalem (2017)
Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters
Alessandro Epasto, Silvio Lattanzi, Renato Paes Leme
KDD '17 (2017)
Email Category Prediction
Aston Zhang, Luis Garcia Pueyo, James B. Wendt, Marc Najork, Andrei Broder
Companion Proc. of the 26th International World Wide Web Conference (2017), pp. 495-503
HyperLogLog Hyper Extended: Sketches for Concave Sublinear Frequency Statistics
KDD (2017) (to appear)
Instance-Level Label Propagation with Multi-Instance Learning
Qifan Wang, Gal Chechik, Chen Sun, Bin Shen
IJCAI (2017) (to appear)
Latent LSTM Allocation: Joint clustering and non-linear dynamic modeling of sequence data
Manzil Zaheer, Amr Ahmed, Alexander Smola
WSDM, ACM (2017) (to appear)
Local Topic Discovery via Boosted Ensemble of Nonnegative Matrix Factorization
Sangho Suh, Jaegul Choo, Joonseok Lee, Chandan K. Reddy
Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Sister conferences track (2017)
Related Event Discovery
Cheng Li, Mike Bendersky, Sujith Ravi, Vijay Garg
Proceedings of WSDM (2017)
SHRec: Scalable Holistic Recommendation
Ahmed Aly, Amr Ahmed, Moustafa Hammad
International Conference on Scientific and Statistical Database Management (2017)
Submodular Optimization Over Sliding Windows
Alessandro Epasto, Morteza Zadimoghaddam, Sergei Vassilvitskii, Silvio Lattanzi
Proceedings of the 26th International World Wide Web Conference, WWW (2017)
Template Induction over Unstructured Email Corpora
Julia Proskurnia, Marc-Allen Cartright, Lluís Garcia-Pueyo, Ivo Krka, James B. Wendt, Tobias Kaufmann, Balint Miklos
Proc. of the 26th International World Wide Web Conference (2017), pp. 1521-1530
The Spread of Physical Activity Through Social Networks
Proceedings of the 26th International World Wide Web Conference 2017, WWW
A New Approach to Optimal Code Formatting
Google, Inc. (2016)
A Simple and Efficient Method to Handle Sparse Preference Data Using Domination Graphs: An Application to YouTube
ICCS 2016, 2302–2311
Deep Neural Networks for YouTube Recommendations
Paul Covington, Jay Adams, Emre Sargin
Proceedings of the 10th ACM Conference on Recommender Systems, ACM, New York, NY, USA (2016) (to appear)
Discovering Structure in the Universe of Attribute Names
Alon Halevy, Natalya Fridman Noy, Sunita Sarawagi, Steven Euijong Whang, Xiao Yu
Proc. 25th International World Wide Web Conference (2016)
Ego-net Community Mining Applied to Friend Suggestion
Alessandro Epasto, Silvio Lattanzi, Vahab S. Mirrokni, Ismail Sebe, Ahmed Taei, Sunita Verma
Proceedings of VLDB (2016)
From Freebase to Wikidata: The Great Migration
Thomas Pellissier Tanon, Denny Vrandečić, Sebastian Schaffert, Thomas Steiner, Lydia Pintscher
World Wide Web Conference, ACM (2016)
Hierarchical Label Propagation and Discovery for Machine Generated Email
James B. Wendt, Michael Bendersky, Lluis Garcia-Pueyo, Vanja Josifovski, Balint Miklos, Ivo Krka, Amitabh Saikia, Jie Yang, Marc-Allen Cartright, Sujith Ravi
Proceedings of the International Conference on Web Search and Data Mining (WSDM), ACM (2016), pp. 317-326
L-EnsNMF: Boosted Local Topic Discovery via Ensemble of Nonnegative Matrix Factorization
Sangho Suh, Jaegul Choo, Joonseok Lee, Chandan K. Reddy
Proceedings of the IEEE International Conference on Data Mining (ICDM) (2016)
LLORMA: Local Low-Rank Matrix Approximation
Joonseok Lee, Seungyeon Kim, Guy Lebanon, Yoram Singer, Samy Bengio
Journal of Machine Learning Research (JMLR), vol. 17 (2016), pp. 1-24
Learning mobile phone battery consumptions
Andres Munoz Medina, Ashish Sharma, Felix Yu, Paul Eastham, Sergei Vassilvitskii, Umar Syed
Workshop on On Device Intelligence (2016)
Linking Users Across Domains with Location Data: Theory and Validation
Chistopher Riederer, Yunsung Kim, Nitish Korula, Silvio Lattanzi, Augustin Chaintreau
WWW (2016) (to appear)
M3A: Model, MetaModel, and Anomaly Detection in Web Searches
Da-Cheng Juan, Neil Shah, Mingyu Tang, Zhiliang Qian, Diana Marculescu, Christos Faloutsos
arXiv preprint arXiv:1606.05978 (2016)
On Sampling Nodes in a Network
Flavio Chierichetti, Anirban Dasgupta, Ravi Kumar, Silvio Lattanzi, Tamas Sarlos
WWW (2016) (to appear)
Open and Closed Schema for Aligning Knowledge and Text Collections.
Workshop on Exploiting Semantic Annotations for Information Retrieval (ESAIR) (2016)
Reverse Ranking by Graph Structure: Model and Scalable Algorithms
Edith Cohen, Eliav Buchnik
ACM SIGMETRICS 2016 (to appear)
TRIÈST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fixed Memory Size
Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, Eli Upfal
ACM SIGKDD (2016) (to appear)
The Limits of Popularity-Based Recommendations, and the Role of Social Ties
Marco Bressan, Stefano Leucci, Alessandro Panconesi, Prabhakar Raghavan, Erisa Terolli
Proceedings of ACM KDD 2016, ACM
When Recommendation Goes Wrong - Anomalous Link Discovery in Recommendation Networks
Bryan Perozzi, Michael Schueppert, Jack Saalweachter, Mayur Thakur
Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016)
Active Learning in Keyword Search-based Data Integration
Zhepeng Yan, Nan Zheng, Zachary G. Ives, Partha Pratim Talukdar, Cong Yu
The VLDB Journal, vol. 24 (2015), pp. 611-631
Applying WebTables in Practice
Sreeram Balakrishnan, Alon Halevy, Boulos Harb, Hongrae Lee, Jayant Madhavan, Afshin Rostamizadeh, Warren Shen, Kenneth Wilder, Fei Wu, Cong Yu
Conference on Innovative Data Systems Research (2015)
Associating Locations with Healthcare Events
Daniel V. Klein, Dean Jackson
Defensive Publications Series, Technical Disclosure Commons (2015)
Automatic Pronunciation Verification for Speech Recognition
Kanishka Rao, Fuchun Peng, Françoise Beaufays
Crowdsourcing and the Semantic Web: A Research Manifesto
Cristina Sarasua, Elena Simperl, Natasha Noy, Abraham Bernstein, Jan Marco Leimeister
Human Computation, vol. 2 (2015)
Discovering Subsumption Relationships for Web-Based Ontologies
Dana Movshovitz-Attias, Steven Euijong Whang, Natalya Noy, Alon Halevy
Proc. 18th International Workshop on the Web and Databases (WebDB) (2015)
Distributed Graph Algorithmics: Theory and Practice
Silvio Lattanzi, Vahab S. Mirrokni
WSDM (2015), pp. 419-420
Efficient Algorithms for Public-Private Social Networks
Flavio Chierichetti, Alessandro Epasto, Ravi Kumar, Silvio Lattanzi, Vahab Mirrokni
Efficient Densest Subgraph Computation in Evolving Graphs
Alessandro Epasto, Silvio Lattanzi, Mauro Sozio
Event Relevant Reminders
Daniel V. Klein, Dean Jackson
Defensive Publications Series, Technical Disclosure Commons (2015)
Fix It Where It Fails: Pronunciation Learning by Mining Error Corrections from Speech Logs
Zhenzhen Kou, Daisy Stanton, Fuchun Peng, Françoise Beaufays, Trevor Strohman
Focus on the Long-Term: It's better for Users and Business
Henning Hohnhold, Deirdre O'Brien, Diane Tang
Proceedings 21st Conference on Knowledge Discovery and Data Mining, ACM, Sydney, Australia (2015)
Improving User Topic Interest Profiles by Behavior Factorization
Zhe Zhao, Zhiyuan Cheng, Lichan Hong, Ed H. Chi
Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2015), pp. 1406-1416
Linked Enterprise Data Model and Its Use in Real Time Analytics and Context-Driven Data Discovery
KUNAL TANEJA, Qian Zhu, Desmond Duggan, Teresa Tung
IEEE International Conference on Mobile Services, 1800 (2015), pp. 277-283 (to appear)
Mining Subjective Properties on the Web
Immanuel Trummer, Alon Halevy, Hongrae Lee, Sunita Sarawagi, Rahul Gupta
SIGMOD (2015) (to appear)
Multi-Objective Weighted Sampling
HotWeb 2015 (to appear)
Scalable Community Discovery from Multi-Faceted Graphs
Ahmed Metwally, Jia-Yu Pan, Minh Doan, Christos Faloutsos
2015 IEEE International Conference on Big Data, IEEE, 445 Hoes Lane Piscataway, NJ 08854-4141 USA (to appear)
Secrets, Lies, and Account Recovery: Lessons from the Use of Personal Knowledge Questions at Google
Joseph Bonneau, Elie Bursztein, Ilan Caron, Rob Jackson, Mike Williamson
WWW'15 - Proceedings of the 22nd international conference on World Wide Web, ACM (2015)
Temporal/Spatial Calendar Events and Triggers
Daniel V. Klein, Dean Jackson
Defensive Publications Series, Technical Disclosure Commons (2015)
Unified and contrasting cuts in multiple graphs: application to medical imaging segmentation
Chia-Tung Kuo, Xiang Wang, Peter Walker, Owen Carmichael, Jieping Ye, Ian Davidson
KDD (2015), pp. 617-626
What can be Found on the Web and How: A Characterization of Web Browsing Patterns
Alexey Tikhonov, Arseniy Chelnokov, Gleb Gusev, Ivan Bogatyy, Liudmila Ostroumova Prokhorenkova
WebSci 2015, Oxford (to appear)
Biperpedia: An Ontology for Search Applications
Rahul Gupta, Alon Halevy, Xuezhi Wang, Steven Whang, Fei Wu
Proc. 40th Int'l Conf. on Very Large Data Bases (PVLDB) (2014)
Distributed Balanced Clustering via Mapping Coresets
Mohammadhossein Bateni, Aditya Bhaskara, Silvio Lattanzi, Vahab Mirrokni
NIPS, Neural Information Processing Systems Foundation (2014)
Frame by Frame Language Identification in Short Utterances using Deep Neural Networks
Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Pedro J. Moreno, Joaquin Gonzalez-Rodriguez
Neural Networks Special Issue: Neural Network Learning in Big Data (2014)
Great Question! Question Quality in Community Q&A
Sujith Ravi, Bo Pang, Vibhor Rastogi, Ravi Kumar
International AAAI Conference on Weblogs and Social Media (ICWSM) (2014)
Handcrafted Fraud and Extortion: Manual Account Hijacking in the Wild
Elie Bursztein, Borbala Benko, Daniel Margolis, Tadek Pietraszek, Andy Archer, Allan Aquino, Andreas Pitsillidis, Stefan Savage
IMC '14 Proceedings of the 2014 Conference on Internet Measurement Conference, ACM, 1600 Amphitheatre Parkway, pp. 347-358
Knowledge Base Completion via Search-Based Question Answering
Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, Dekang Lin
Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion
Xin Luna Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, Wei Zhang
The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '14, New York, NY, USA - August 24 - 27, 2014, pp. 601-610
Near Neighbor Join
Herald Kllapi, Boulos Harb, Cong Yu
On Estimating the Average Degree
Anirban Dasgupta, Ravi Kumar, Tamas Sarlos
23rd International World Wide Web Conference, WWW '14, ACM (2014) (to appear)
Quizz: Targeted Crowdsourcing with a Billion (Potential) Users
Panos Ipeirotis, Evgeniy Gabrilovich
WWW (2014) (to appear)
RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response
Úlfar Erlingsson, Vasyl Pihur, Aleksandra Korolova
Proceedings of the 21st ACM Conference on Computer and Communications Security, ACM, Scottsdale, Arizona (2014)
Reducing the Sampling Complexity of Topic Models
Aaron Li, Amr Ahmed, Sujith Ravi, Alexander J Smola
ACM Conference on Knowledge Discovery and Data Mining (KDD) (2014)
Scalable Hierarchical Multitask Learning Algorithms for Conversion Optimization in Display Advertising
Amr Ahmed, Abhimanyu Das, Alexander J. Smola
ACM International Conference on Web Search And Data Mining (WSDM) (2014)
Taxonomy Discovery for Personalized Recommendation
Yuchen Zhang, Amr Ahmed, Vanja Josifovski, Alexander J Smola
ACM International Conference on Web Search And Data Mining (WSDM) (2014)
Trust, but Verify: Predicting Contribution Quality for Knowledge Base Construction and Curation
Chun How Tan, Eugene Agichtein, Panos Ipeirotis, Evgeniy Gabrilovich
WSDM (2014) (to appear)
Unsupervised Spatial Event Detection in Targeted Domains with Applications to Civil Unrest Modeling
Liang Zhao, Feng Cheng, Jing Dai, Ting Hua, Chang-Tien Lu, Naren Ramakrishnan
PLOS ONE, vol. 9 (2014), pp. 1-12
A Framework for Benchmarking Entity-Annotation Systems
Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita
Proceedings of the International World Wide Web Conference (WWW) (Practice & Experience Track), ACM (2013)
Classifying YouTube Channels: a Practical System
Proceedings of the 2nd International Workshop on Web of Linked Entities (WOLE 2013), in Proceedings of the 22nd International conference on World Wide Web companion, ACM, pp. 1295-1304
Compacting Large and Loose Communities
Chandrashekhar V., Shailesh Kumar, C. V. Jawahar
Asian Conference on Pattern Recognition (2013) (to appear)
Crawling deep web entity pages
Yeye He, Dong Xin, Venkatesh Ganti, Sriram Rajaraman, Nirav Shah
WSDM (2013), pp. 355-364
Crowd-Sourced Call Identification and Suppression
Daniel V. Klein, Dean K. Jackson
Federal Trade Commission Robocall Challenge (2013)
Data Fusion: Resolving Conflicts from Multiple Sources
Xin Luna Dong, Laure Berti-Equille, Divesh Srivastava
WAIM (2013), pp. 64-76 (to appear)
Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real-time Story Identification
Albert Angel, Nick Koudas, Nikos Sarkas, Divesh Srivastava, Michael Svendsen, Srikanta Tirthapura
The VLDB Journal (2013), pp. 1-25
Distributed Large-scale Natural Graph Factorization
Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy,, Vanja Josifovski, Alexander J Smola
Proceedings of the 22nd International World Wide Web Conference (WWW 2013) (to appear)
Diversity maximization under matroid constraints
Zeinab Abbassi, Vahab Mirrokni, Mayur Thakur
KDD, ACM SIGKDD (2013), pp. 32-40
Efficient and Accurate Label Propagation on Large Graphs and Label Sets
Michele Covell, Shumeet Baluja
Proceedings International Conference on Advances in Multimedia, IARIA (2013)
Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
Wei Xu, Raphael Hoffmann, Le Zhao, Ralph Grishman
Focused Marix Factorization for Audience Selection in Display Advertising
Bhargav Kanagal, Amr Ahmed, Sandeep Pandey, Vanja Josifovski, Lluis Garcia-Pueyo, Jeff Yuan
Proceedings of the 29th International Conference on Data Engineering (ICDE) (2013)
From Assets to Stories via the Google Cultural Institute Platform
W. Brent Seales, Steve Crossan, Sertan Girgin, Mark Yoshitake
IEEE BigData'13 Big Data and the Humanities (2013), pp. 6 (to appear)
GOOGLE DISEASE TRENDS: AN UPDATE
Patrick Copeland, Raquel Romano, Tom Zhang, Greg Hecht, Dan Zigmond, Christian Stefansen
International Society of Neglected Tropical Diseases 2013, International Society of Neglected Tropical Diseases, pp. 3
Identifying Surrogate Geographic Research Regions with Advanced Exact Test Statistics
American Marketing Association Advanced Research Techniques Forum (2013), Poster
I am SOUMEN CHAKRABARTI, anagram for ANARCHISM OUTBREAK, a faculty member in the Department of Computer Science.
If you are from industry looking for consultation, please read the section titled Consultative practice rules and norms (1996)herein, and my informal notes.
If you are looking to join CSE@IITB as a PhD scholar, please read about the PhD Qualifier model being adopted by the department, and contact the department office directly. PhD admissions is centrally coordinated at the department level.
I do not offer short term projects or summer internships to students not enrolled at IIT Bombay. Such emails will be discarded.
If you are an IIT student looking for a project or seminar within the scope of your program (Btech, DD, Mtech) please read these guidelines first. You can check my calendar for free slots and, if you have permission, propose a meeting here or by email.
The best way to contact me is to send mail to (please note that I am on a low-spam diet). Please use only email to initiate a conversation with me if we haven't communicated before. Only in case of an emergency, you can call me at +91-22-2576-7716 or fax me at +91-22-2572-0022. If you are visiting, here are directions to my office.
Education and career
- Don Bosco School, Park Circus, Calcutta, 1975--1987
- Indian Institute of Technology, Kharagpur, 1987--1991
- University of California, Berkeley, 1991--1996
- IBM Almaden Research Center, 1996--1999
- IIT Bombay, 1999--present
- Carnegie-Mellon University, Spring 2004
- Searching the annotated Web with entities, types and relations
- We are building CSAW, a new search system that integrates type and role annotations with keyword matches, thereby exploiting lexical ontologies and entity taggers. Supported by Yahoo!, HP Labs, Google, Microsoft, SAP and NetApp.
- Graph conductance search
- Rich connections between random walks, graph eigensystems, and electrical networks make it attractive to apply them for ranking nodes. PageRank is a prominent example of the paradigm. In PageRank, the edge weights are fixed and we have to compute steady state probabilities of nodes. What if we have something like the opposite problem? And how to make this fast at query time? Supported by IBM and Microsoft (2007, 2008).
- Integrating IR with databases
- In the BANKS project, we proposed new paradigms of keyword search in graphs that can represent text embedded in relational or XML-like data.
- The effect of search engines on the Web graph and page popularity
- Search engines are influenced by the (in)degree of Web pages, but their ranked lists modulate page popularity and eventually their (in)degree, setting up a feedback to some degree. Might the evolution of the Web graph be influenced substantially by the existence of search engines? Is there a need to regulate monopolies? What are healthy economic objectives, and how to optimize them?
- Focused crawlers to build topic-specific portals
- A focused crawler collects a topic-specific subgraph of the Web by coupling classifiers and reinforcement learners with crawlers. An open-source focused crawler project was started at the Lab. for Intelligent Internet Research and is available.
- Mining hypertext to estimate topics and popularity
- I built a hypertext classifier that uses the text in and links around a given Web page to label it with a topic. This was an early application of Markov networks to Web analysis. As a member of the IBM Clever Project, I worked on algorithms to analyze the links around a web page and the text in pages that cite the given page to assign it a measure of popularity.
- Compiling and running parallel scientific programs
- In a previous life, my PhD thesis was on the design and implementation of compilers and runtime systems for distributed memory multiprocessors. Seems like distributed parallel computing is hot again, thanks to "Big Data"!
- Journal editorship
- Conference organization
- WWW 2017, poster track co-chair with Mounia Lalmas and Wei Chen.
- CIKM 2014, area char for text and Web data mining.
- EMNLP 2013, area chair for information retrieval and question answering.
- WWW 2013, track chair for search, systems and applications.
- SIGIR 2011, area chair for Web IR and social media search.
- WWW 2010, program co-chair with Juliana Freire.
- SIGIR 2010, senior PC member.
- Web Search APIs: The Next Generation --- A panel discussion at WWW 2009. Panel slides.
- SIGIR 2009, Area Chair, Machine Learning for IR.
- WSDM 2008 ("wisdom"), Program Co-chair with Andrei Broder.
- VLDB 2007, Tutorial Co-Chair.
- ECML-PKDD 2006, Area Chair, Track for mining links, graphs, trees and high-dimensional data.
- WWW 2006, Deputy Chair, Data Mining track.
- COMAD 2005b, Associate Program Chair.
- WWW 2003, Vice Chair, Searching and Mining track.
- ICDE 2003. Vice Chair, Data, Text and Web Mining track.
- WWW 2002, Deputy Chair, Searching, Querying and Indexing track (CFP).
- Conference committee/reviewing
- ICML 2018, NAACL 2018, WSDM 2018 (test of time awards), SIGIR 2017 (awards), SIGKDD 2017 (awards), WSDM 2017 (awards), NIPS 2017, ACL 2017; NIPS 2016, SIGIR 2016; CIKM 2014, ISWC 2014, SIGIR 2014, ACL 2014, WSDM 2014 (senior PC); SIGKDD 2013 (senior PC), WSDM 2013 (senior PC and awards committee); EMNLP 2012, SIGKDD 2012 (senior PC), WWW 2012; NIPS 2011, ICML 2011 (PC and invited applications talks committee), WWW 2011; SIGKDD 2010; NIPS 2009, WWW 2009, WSDM 2009 (senior PC); SIGKDD 2008 (senior PC), SIGIR 2008 (senior PC), WWW 2008; WWW 2007, SIGMOD 2007; SIGKDD 2006 (senior PC); EMNLP/HLT 2005, SIGKDD 2005, WWW 2005 (panel), SIGMOD 2005; SIGKDD 2004, SIGIR 2004, VLDB 2004, WWW 2004, ICDE 2004; SIGIR 2003, SIGKDD 2003, VLDB 2003 (IIS), SODA 2003; SIGIR 2002, ICDE 2002; SIGIR 2001, WWW 2001; WWW 2000; SIGKDD 1999; SIGKDD 1998.
- Web Search and Data Mining (WSDM) steering committee member, 2008--2013.
- ACM SIGKDD Curriculum Committee Member.
But the power of instruction is seldom of much efficacy, except in those happy dispositions where it is almost superfluous.
The Decline And Fall Of The Roman Empire
Volume 1, Chapter 4.
- Web Search and Mining has been expanded to a two-semester sequence, shorthanded WMa (Autumn) and WMb (Spring). WMa retains the old course code, but has been planned from scratch. WMb will be largely about information extraction and integration, and querying over semistructured and graphical data representations. WMa Autumn 2009, WMb Spring 2010, WMa Autumn 2010, WMb Spring 2011, WMa Autumn 2011, WMa Spring 2013, WMa Autumn 2013, WMb Spring 2014, WMa Autumn 2016, WMb Spring 2017, WMa Autumn 2017.
- Statistical Foundations of Machine Learning: Autumn 2005, Autumn 2006, Autumn 2007, Autumn 2008.
- Web Search and Mining (earlier called Information Retrieval and Mining for Hypertext and the Web): Spring 2001, Spring 2002, Spring 2003, Spring 2005, Spring 2006 (new improved), Spring 2007, Spring 2008, Spring 2009.
- Undergraduate Programming Languages, Spring 2000, Autumn 2000, Autumn 2001, Autumn 2002, Autumn 2003, Autumn 2004.
- Computer programming and utilization aka CS101, Spring 2012.
- Graduate Software Lab: Autumn 1999, Autumn 2000.
... your work is to keep cranking the flywheel that turns the gears
that spin the belt in the engine of belief that keeps you and your desk in midair
The Writing Life.
Representative publication DBLP, Google Scholar?
- Generalizing Across Domains via Cross-Gradient Training. With Shiv Shankar, Vihari Piratla, Siddhartha Chaudhuri, Preethi Jyothi, and Sunita Sarawagi. ICLR 2018.
- Task-Specific Representation Learning for Web-scale Entity Disambiguation. With Rijula Kar, Susmija Reddy, Sourangshu Bhattacharya and Anirban Dasgupta. AAAI 2018.
- A Two-Stage Framework for Computing Entity Relatedness in Wikipedia. With Marco Ponza and Paolo Ferragina. CIKM 2017.
- Relay-Linking Models for Prominence and Obsolescence in Evolving Networks [paper, video]. With Mayank Singh, Rajdeep Sarkar, Pawan Goyal, and Animesh Mukherjee. SIGKDD 2017.
- Earth Mover Distance Pooling over Siamese LSTMs for Automatic Short Answer Grading. With Sachin Kumar and Shourya Roy. IJCAI 2017.
- Collective Entity Resolution with Multi-Focal Attention. With Amir Globerson, Nevena Lazic, Amarnag Subramanya, Michael Ringgaard and Fernando Pereira. ACL 2016.
- Discriminative Link Prediction using Local, Community, and Global Signals. With Abir De, Sourangshu Bhattacharya, Sourav Sarkar and Niloy Ganguly. IEEE TKDE Journal, 2016.
- Learning a Linear Influence Model Between Actors from Transient Opinion Dynamics. With Abir De, Sourangshu Bhattacharya, Parantapa Bhattacharya, and Niloy Ganguly. CIKM 2014.
- Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries. With Mandar Joshi and Uma Sawant. EMNLP 2014.
- Quantity Queries on Web Tables: Annotation, Response and Consensus Models. With Sunita Sarawagi. SIGKDD 2014.
- Discriminative Link Prediction using Local Links, Node Features and Community Structure. With Abir De and Niloy Ganguly. ICDM 2013.
- Joint Bootstrapping of Corpus Annotations and Entity Types. With Siddhanth Jain and Hrushikesh Mohapatra. EMNLP 2013.
- Web-scale Entity Annotation Using MapReduce. With Shashank Gupta and Varun Chandramouli. HiPC 2013.
- Learning Joint Query Interpretation and Response Ranking. With Uma Sawant. WWW 2013.
- Compressed Data Structures for Annotated Web Search. With Sasidhar Kasturi, Bharath Balakrishnan, Ganesh Ramakrishnan, and Rohit Saraf. WWW 2012.
- Diversity in ranking via resistive graph centers. With Avinava Dubey and Chiru Bhattacharyya. SIGKDD 2011. (Source code is available, contact Avinava Dubey for usage details.)
- SCAD: Collective Discovery of Attribute Values. With Anton Bakalov, Ariel Fuxman, and Partha Talukdar. WWW 2011.
- Index Design and Query Processing for Graph Conductance Search. With Amit Pathak and Manish Gupta. VLDB Journal, 2010.
- Annotating and Searching Web Tables Using Entities, Types and Relationships. With Girija Limaye and Sunita Sarawagi. VLDB 2010.
- Conditional Models for Non-smooth Ranking Loss Functions. With Avinava Dubey, Jinesh Machchhar, and Chiru Bhattacharyya. ICDM 2009, Miami.
- Learning to rank for quantity consensus queries. With Somnath Banerjee and Ganesh Ramakrishnan. SIGIR 2009, Boston.
- Collective annotation of Wikipedia entities in Web text. With Sayali Kulkarni, Amit Singh and Ganesh Ramakrishnan. SIGKDD 2009, Paris.
- Text search enhanced with types and entities. Chapter in Text Mining: Theory, Application, and Visualization, Srivastava and Sahami, eds., 2008.
- New closed form bounds on the partition function. With Dvijotham Krishnamurthy and Subhasis Chaudhuri. ECML/PKDD 2008, Antwerp. Winner of the best student paper award.
- Structured Learning for Non-Smooth Ranking Losses. With Rajiv Khanna, Uma Sawant and Chiru Bhattacharyya. SIGKDD 2008, Las Vegas.
- Learning to rank in vector spaces and social networks. Internet Mathematics, 2008.
- Focused Web Crawling. Entry in the Encyclopedia of Database Systems, 2008.
- The influence of search engines on preferential attachment. With Alan Frieze and Juan Vera. Internet Mathematics, volume 3, number 3 (2006--2007), pages 361--381. A preliminary version appeared in SODA 2005.
- Learning Random Walks to Rank Nodes in Graphs. With Alekh Agarwal. ICML 2007, Oregon.
- Dynamic Personalized Pagerank in Entity-Relation Graphs. WWW 2007, Banff.
- Accelerating Newton optimization for log-linear models through feature redundancy. With Arpit Mathur. IEEE ICDM 2006, Hong Kong.
- Learning parameters in entity-relationship graphs from ranking preferences. With Alekh Agarwal. ECML-PKDD 2006, Berlin.
- Learning to rank networked entities. With Alekh Agarwal and Sunny Aggarwal. SIGKDD Conference 2006, Philadelphia.
- Optimizing Scoring Functions and Indexes for Proximity Search in Type-annotated Corpora. With Kriti Puniyani and Sujatha Das. WWW 2006, Edinburgh.
- Enhanced Answer Type Inference from Questions using Sequential Models. With Vijay Krishnan and Sujatha Das. EMNLP/HLT 2005, Vancouver.
- Bidirectional Expansion For Keyword Search on Graph Databases. With Varun Kacholia, Shashank Pandit, S. Sudarshan, Rushi Desai and Hrishikesh Karambelkar. VLDB 2005.
- Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results. With Sandeep Pandey, Sourashis Roy, Chris Olston, and Junghoo Cho. VLDB 2005.
- Is question answering an acquired skill? With Ganesh Ramakrishnan, Deepa Paranjpe, and Pushpak Bhattacharyya. WWW2004, New York City.
- Fast and accurate text classification via multiple linear discriminant projections. With Shourya Roy and Mahesh Soundalgekar. VLDB Journal, 12(2), pages 170--185 [conference version, talk slides].
- Cross-Training: Learning Probabilistic Mappings Between Topics. With Sunita Sarawagi and Shantanu Godbole. SIGKDD Conference 2003, Washington D.C.
- Monitoring the Dynamic Web to respond to Continuous Queries. With Sandeep Pandey and Krithi Ramamritham. WWW 2003, Budapest, Hungary, May 2003. (talk slides.)
- Accelerated focused crawling through online relevance feedback. With Kunal Punera and Mallela Subramanyam. WWW 2002, Hawaii. (Local copy.)
- The structure of broad topics on the Web. With Mukul Joshi, Kunal Punera, and David M. Pennock. WWW 2002, Hawaii. (Local copy.)
- Keyword Searching and Browsing in Databases using BANKS. With Gaurav Bhalotia, Charuta Nakhe, Arvind Hulgeri, and S. Sudarshan. In ICDE 2002. Also see the BANKS home page. Winner of the ICDE 2012 influential paper award.
- Enhanced topic distillation using text, markup tags, and hyperlinks. With Mukul M. Joshi and Vivek B. Tawde. In SIGIR 2001 (talk slides).
- Integrating the Document Object Model with hyperlinks for enhanced topic distillation and information extraction. In the 10th International World Wide Web Conference, Hong Kong, May 2001.
- Memex: A browsing assistant for collaborative archiving and mining of surf trails. With Sandeep Srivastava, Mallela Subramanyam and Mitul Tiwari. Demo at VLDB 2000.
- Data mining for hypertext: A tutorial survey. SIGKDD Explorations, 1(2), pages 1--11, 2000.
- Using Memex to archive and mine community Web browsing experience. With Sandeep Srivastava, Mallela Subramanyam and Mitul Tiwari. In the 9th International World Wide Web Conference, Amsterdam, May 2000. Talk slides. Social bookmarking companies founded long after this paper: HistorySE, Delicious, Digg, StumbleUpon, Reddit, Furl, Simpy, Citeulike, etc.
- Mining the Web's Link Structure. With Byron E. Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson, and Jon Kleinberg. In IEEE Computer, vol. 32, no. 8, August 1999 (IEEE copy).
- Distributed Hypertext Resource Discovery Through Examples. With Martin van den Berg and Byron Dom. VLDB 1999, Edinburgh, Scotland. Talk slides.
- Hypersearching the Web. With Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, Jon M. Kleinberg, and David Gibson. Invited paper in Scientific American, June 1999.
- Surfing the Web Backwards. With D. A. Gibson and K. S. McCurley. In WWW 1999.
- Focused crawling: A new approach to topic-specific Web resource discovery. With M. van den Berg and B. Dom. WWW 1999, Toronto, May 1999. Winner of the best paper award. Also see the project page.
Upcoming and recent talks and travel
- Tutorial with Partha Talukdar at CIKM 2017 on Knowledge Extraction and Inference from Text.
- Keynote talk at CoDS 2017, Chennai, March 2017.
- Keynote talk at CIKM 2014 Industry Track, Nov 2014.
- Keynote talk at COMSNETS 2014, Bangalore, Jan 2014.
- Tutorial on Query Interpretation and Representation for Searching the Web of Objects at WWW 2013, Rio de Janeiro.
- WWW 2010 Conference, NC, April 2010.
- Keynote talk at WSDM 2010, NYC, February 2010. [Talk slides.]
- WWW 2010 PC meeting, Salt Lake City, Utah, January 2010.
- WWW 2009 tutorial and panel, April 2009.
- SIGIR 2008 PC meeting, University of Maryland, March 2008.
- WSDM 2008, Stanford University, February 2008.
- Tutorial on Learning to rank in vector spaces and social networks at WWW 2007, Banff.
- Keynote talk at WAW and a short course at Banff, Nov 2006.
- Invited talk at the International Workshop on Intelligent Information Access, Helsinki, July 2006.
- Invited talk at the ICML 2005 workshop on Learning in Web Search.
- Invited talk at the ICML 2005 workshop on Learning and Extending Lexical Ontologies by using Machine Learning Methods.
- Panel discussion on exploiting dynamic networking effects in Web advertising at WWW 2005.
- Invited talk and position paper at ECML/PKDD in Pisa, Sept. 2004.
- Short course on machine learning for hypertext applications at ADFOCS in Saarbrücken, Sept. 2004.
- Graph structures in data mining. A tutorial presented at SIGKDD 2004 with Christos Faloutsos.
- Text search for fine-grained semi-structured data. A tutorial presented at VLDB 2002.
- Beyond hubs and authorities: spreading out and zooming in. Invited talk at ICDT International Workshop on Web Dynamics, London, Jan. 2001.
- Data Mining and Learning on the Web. NIPS Workshop, Denver, Dec. 2000. By invitation.
- Nurturing content-based collaborative communities on the Web. Invited talk at the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC), Hong Kong, Oct. 7--8, 2000.
- Hypertext data mining: A tutorial presented at the SIGKDD Conference, Boston, August 2000.
- Hypertext databases and hypertext data mining. SIGMOD 1999 Tutorial.
- Method and system for searching unstructured textual data for quantitative answers to queries.
- System and method for focussed web crawling.
- Enhanced hypertext categorization using hyperlinks.
- System and method for scheduling web servers with a quality-of-service guarantee for each user.
- Method for interactively creating an information database including preferred information elements, such as, preferred-authority, world wide web pages.
- Method for cataloging, filtering, and relevance ranking frame-based hierarchical information structures.
- Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values.
- System and method for mining surprising temporal patterns.
- Feature diffusion across hyperlinks.
Links in areas of interest
Content with URLs that have the current URL as a prefix has been hosted in accordance with fair use principles, for academic and non-profit purposes. By downloading the contents of this page, you agree to bring possible violation of fair use to my notice before taking legal recourse.