Link Prediction

Many real world domains are relational in nature consisting of different objects linked in complex ways. Previous studies have shown that studying these relationships not only give information on how objects currently interact but also provide insight on who the objects being linked are and what other interactions are also likely to be present. The problem with using links, however, is that link information is often incomplete and inaccurate. Moreover, links are often unlabeled such that multiple diverse links may be treated as a single type.

Our research looks at these problems in the task commonly referred to as link prediction. In the simplest case, link prediction can be defined as a multi-class problem where the classes are the different link types with a special class representing when a link does not exist. Often, however, link prediction is reduced to a binary classification problem based only on whether or not a link exists. The biggest issue with link prediction is the large class skew between links that exists and don't exists. The number of non existant links are often quadratic to the number of object while the number of existent links is usually linear.

We have applied Probabilistic Relational Models (PRM) to the problem of link prediction by defining structural uncertainty in the model. In addition, we have looked at the link prediction problem as a ranking problem for identifying the different relationship types in a communication network. We show that not only can we accurately identify different relationship types, we can use the same model to identify which communications most support the classification. We are also currently exploring how to improve link prediction by combining it with the related tasks like collective object classification. We have created and plan to release a synthetic data generator to test our link prediction algorithms over a wide range of parameters including link noise, homophily, and link density. Aside from the problem of predicting links, we are also exploring the task of data anonymization. In data anonymization, instead of predicting links, we are applying various techniques to anonymize sensitive links such that cannot be accurately predicted using link prediction techniques.



  • Synthetic Data Generator: We are currently developing a synthetic data generator where parameters relevant to both link prediction (i.e. homophily, link noise, link density) and collective object classification (i.e. number of labels, attribute noise) can be evaluated over.
    Coming Soon