
LINQS
STATISTICAL RELATIONAL LEARNING GROUP @ UMD
Collective Relational Clustering in Constrained Clustering: Advances in Algorithms, Theory, and Applications
Chapman and Hall, page 223-243 - 2008
In many clustering problems, in addition to attribute data, we have
relational information, linking different data points. In this chapter, we focus on the problem of collective relational clustering that makes use of both attribute and relational information. The approach is collective in that clustering decisions are not taken in an independent fashion for each pair of data points. Instead, the different pair-wise decisions depend on each other. The first set of dependencies is among multiple decisions involving the same data point. The other set of dependencies come from the relationships. Decisions for any two references that are related in the data are also dependent on each other. Hence, the approach is collective as well as relational. We focus on the entity resolution problem as an application of the clustering problem, and we survey different proposed approaches that are collective or make use of relationships. One of the approaches is an agglomerative greedy clustering algorithm where the cluster similarity measure combines both attributes and relationships in a collective way. We discuss the algorithmic details of this approach and identify data characteristics that influence its correctness. We also present experimental results on multiple real-world and synthetic datasets.
BibTex references
@InBook\{bhattacharya:ch08,
author = "Bhattacharya, Indrajit and Getoor, Lise",
title = "Collective Relational Clustering in Constrained Clustering: Advances in Algorithms, Theory, and Applications",
chapter = "10",
series = "CRC Data Mining Series",
pages = "223-243",
year = "2008",
publisher = "Chapman and Hall",
}

