SUMMER SCHOOL

----------------------------------------

July 9
9:00am ¨C 10:20am, Tutorial by Prof. Friedman
10:20am ¨C 10:40am, Tea Break
10:40am ¨C 12:00am, Tutorial by Prof. Friedman
12:00am-2:00pm, Lunch
2:00pm ¨C 3:20pm, Tutorial by Prof. McCallum
3:20pm ¨C 3:40pm, Tea Break
3:40pm ¨C 5:00pm, Tutorial by Prof. McCallum

July 10
9:00am ¨C 10:20am, Tutorial by Prof. Friedman
10:20am ¨C 10:40am, Tea Break
10:40am ¨C 12:00am, Tutorial by Prof. Friedman
12:00am-2:00pm, Lunch
2:00pm ¨C 3:20pm, Tutorial by Prof. McCallum
3:20pm ¨C 3:40pm, Tea Break
3:40pm ¨C 5:00pm, Tutorial by Prof. McCallum

Lectures
Title: Tree Based Approaches to Statistical Machine Learning
Lecturer: Jerome Friedman (Stanford Univ.)
Abstract:
This workshop will focus on machine learning techniques based on decision trees. Decision trees and related procedures are among the most popular in data mining. After a general introduction to the statistical machine learning problem (regression and classification) the basics of single decision tree learning will be discussed. Following that, improved learning methods based on ensembles of trees will be described. These include bagging, random forests, boosting and rule fitting.

Title: Information Extraction, Data Mining and Topic Modeling with Probabilistic Models
Lecturer: Andrew McCallum (Univ. of Massachusetts)
Abstract:
In this talk I will describe recent research at the intersection of information extraction, data mining and social network analysis. In particular I will focus on how such a combination can be made both robust and scalable---showing that the typical brittle cascading of errors from text extraction to data mining can be avoided with unified probabilistic inference in graphical models, and showing that these models can be made efficient with recent methods of approximate inference and learning. After briefly introducing conditional random fields, I will demonstrate their use in joint models of extraction, entity resolution, and sequence alignment.
I will then describe several methods of integrating textual and other data in a "looser" type of data mining---topic modeling. These are Bayesian latent-variable models that can discover rich and interpretable cooccurrence patterns in high-dimensional data, including data from multiple modalities. I'll introduce a wide array of such models, including applications to nested correlations, expert-finding, trend analysis, career path modeling, research literature impact measurement.
Joint work with colleagues at UMass: Charles Sutton, Aron Culotta, Wei Li, Chris Pal, Ben Wellner, Michael Hay, Xuerui Wang, Natasha Mohanty, David Mimno, Pallika Kanani, Kedare Bellare, Michael Wick, Rob Hall, Gideon Mann, and Andres Corrada.

Date
July 9-10, 2007

Place
Microsoft Research Asia
Multifunction Room, B1 Sigma Building
No.49 Zhichun Road, Haidian District, Beijing.

Registration
Researchers and students interested in statistical learning are encouraged to attend the summer school. Those who want to participate must make a registration by sending an email to sssl2007@hotmail.com, containing name, phone number, email and affiliation information. No registration fee is required. Since there is limitation on the seats, only the first 150 registrations can be accepted.

¡¡

Copy Right© 2007
School of Mathematical Sciences, Peking University
¡¡