Understanding No-Regret Learning

Michael Kearns

Computer and Information Science, University of Pennsylvania

Beginning at least as early as the 1950s, the long and still growing literature on no-regret learning establishes the following type of result: On any sequence of T trials in which the predictions of K "experts" are observed, it is possible to maintain a dynamically weighted prediction whose per-step regret to the best single expert in *hindsight* (that is, after the full sequence has been revealed) diminishes rapidly with T. It is an historically rich topic, having origins in statistics, game theory, and information theory, and enjoying active research in the modern machine learning community.

This tutorial will attempt to describe simply some of the core ideas behind no-regret learning, and to provide some new perspectives and analytic techniques for understanding the strengths and weaknesses of no-regret algorithms. These will include some surprising empirical results on no-regret learning applied to the S&P 500, as well as recent theoretical results examining the trade-off between having small regret to the best expert and no regret to the average expert.

The tutorial will be self-contained, and will assume no prior knowledge of any aspect of finance.

-------


Michael Kearns is a professor in the Computer and Information Science Department at the University of Pennsylvania, where he holds the National Center Chair in Resource Management and Technology. He has a secondary appointment in the Operations and Information Management (OPIM) department of the Wharton School, and until July 2006 was the co-director of Penn's interdisciplinary Institute for Research in Cognitive Science. He is also the head of quantitative strategy development in the Equity Strategies department of Lehman Brothers in New York City. Before joining the Penn faculty in 2002, he spent a decade in basic AI and machine learning research at AT&T Labs and Bell Labs, where he served as heads of the Machine Learning department and the Secure Systems Reseach department.
 
Challenges in Statistical Machine Learning

John Lafferty

School of Computer Science, Carnegie Mellon University


A surge of research in machine learning during the past decade has led to powerful learning methods that are successfully being applied to a wide range of application domains, from search engines to computational biology and robotics. These advances have in part been achieved by refining the art and engineering practice of machine learning, paralleled by a confluence of machine learning and statistics. But an understanding of the scientific foundations and fundamental limits to learning from data can also be effectively leveraged in practice. In this overview of recent work we present some of the current technical challenges in the field of machine learning, focusing on high dimensional data and minimax rates of convergence, a measure of learnability that parallels channel capacity in information theory. These challenges include understanding the role of sparsity in statistical learning, semi-supervised learning, the tradeoff between computation and risk, and structured prediction problems.

-------


John Lafferty is a professor in the Computer Science Department and the Machine Learning Department within the School of Computer Science at Carnegie Mellon University. Professor Lafferty's research interests are in machine learning, statistical learning theory, computational statistics, natural language processing, information theory, and information retrieval. Prof. Lafferty received the Ph.D. in Mathematics from Princeton University, where he was also a member of the Program in Applied and Computational Mathematics. He was an assistant professor in the Mathematics Department at Harvard University before joining the Computer Sciences Department of the IBM Thomas J. Watson Research Center as a Research Staff Member, working in Frederick Jelinek's group on statistical natural language processing. He has been a member of the faculty at Carnegie Mellon University since 1994. Prof. Lafferty currently serves as co-director (with Steve Fienberg) of CMU's Ph.D. Program in Computational and Statistical Learning, and as an associate editor of the Journal of Machine Learning Research.
 
Joint Inference in Natural Language Processing, Information Extraction, and Social Network Analysis

Andrew McCallum

University of Massachusetts, Amherst


In this tutorial I will describe recent research at the intersection of information extraction, data mining and social network analysis. In particular I will focus on how such a combination can be made both robust and scalable---showing that the typical brittle cascading of errors from text extraction to data mining can be avoided with unified probabilistic inference in graphical models, and showing that these models can be made efficient with recent methods of approximate inference and learning. After briefly introducing conditional random fields, I will demonstrate their use in joint models of extraction, entity resolution, and sequence alignment.

I will then describe two methods of integrating textual data into a particular type of data mining---social network analysis. In one model, we discover role-similarity between entities by examining not only network connectivity, but also the words communicated on on those edges; I'll demonstrate this method on a large corpus of email data subpoenaed as part of the Enron investigation. In another model, we discover groups of entities and the "topical" conditions under which different groupings arise; I'll demonstrate this on coalition discovery from many years worth of voting records in the U.S. Senate and the U.N. I'll conclude with further examples of graphical models successfully applied to relational data, as well as discussion of their applicability to trend analysis, expert-finding and bibliometrics.

Joint work with colleagues at UMass: Charles Sutton, Aron Culotta, Chris Pal, Ben Wellner, Michael Hay, Xuerui Wang, Natasha Mohanty, David Mimno, Gideon Mann, Wei Li, and Andres Corrada.

-------


Andrew McCallum is an Associate Professor in the Computer Science Department at University of Massachusetts Amherst. He was previously Vice President of Research and Development at WhizBang Labs, a company that used machine learning for information extraction from the Web. In the late 1990's he was a Research Scientist and Coordinator at Justsystem Pittsburgh Research Center, where he spearheaded the creation of CORA, an early research paper search engine that used machine learning for spidering, extraction, classification and citation analysis. McCallum received his PhD from the University of Rochester in 1995, followed by a post-doctoral fellowship at Carnegie Mellon University. He is currently an action editor for the Journal of Machine Learning Research, and on the board of the International Machine Learning Society. For the past ten years, McCallum has been active in research on statistical machine learning applied to text, especially information extraction, document classification, clustering, finite state models, semi-supervised learning, and social network analysis. New work on search and bibliometric analysis of open-access research literature can be found at http://rexa.info. McCallum's web page: http://www.cs.umass.edu/~mccallum. Prediction with and without exchangeability.
 
Prediction with and without exchangeability

Glenn Shafer

Rutgers Business School


A recent book, Algorithmic Learning in a Random World, by Vovk, Gammerman, and myself, explains how to obtain confidence intervals for predictions under what is called randomness in machine learning and exchangeability in mathematical statistics. This technique can be applied to practically any method for prediction studied in machine learning. In this tutorial I explain the method and contrast it with two methods that do not require randomness: classical regression in statistics and defensive forecasting.