LINQS

STATISTICAL RELATIONAL LEARNING GROUP @ UMD



 

Deduplication and Group Detection using Links

ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD) - 2004
Download the publication : bhattacharyakdd04-whskp.pdf [237Ko]  
Clustering is a fundamental problem in data mining. Traditionally, clustering is done based on the similarity of the attribute values of the entities to be clustered. More recently, there has been greater interest in clustering relational and structured data. Often times this data is best described as a graph, in which there are both entities, described by a collection of attributes, and links between entities, representing the relations between them. Clustering in these scenarios becomes more complex, as we should also take into account the similarity of the entity links when we are clustering. We propose novel distance measures for clustering linked data, and show how they can be used to solve two important data mining tasks, entity deduplication and group discovery.

BibTex references

@InProceedings{bhattacharya:kdd04-wkshp,
  author       = "Bhattacharya, Indrajit and Getoor, Lise",
  title        = "Deduplication and Group Detection using Links",
  booktitle    = "ACM SIGKDD Workshop on Link Analysis and Group Detection (LinkKDD)",
  year         = "2004",
}

Other publications in the database