Abstract—With the proliferation of very large network datasets in real world applications, there has been increasing interest towards link prediction, especially in social networks. Most of our previous efforts have neglected the semantic information associated with such networks. The abstract information of research documents in a co-authorship network is one such example. We build a link predictor for such networks, where nodes represent researchers and links represent co-authorships. We use the structure of the constructed graph, and propose to add a semantic approach using abstract information, research titles and the event information to improve the accuracy of the predictor. We also make use of the fact that researchers tend to work in close knit communities. The knowledge of a pair of researchers belonging to the same research community can be used to improve the accuracy of our predictor. To test our hypothesis, we use an Ensemble clustering of the DBLP network to supplement its structural features and test it in a reasonable time by under sampling and balancing the dataset using decision trees and the SMOTE technique.
Index Terms—Graph mining, Link Prediction, Graph Clustering, Co-authorship Networks
Mrinmaya Sachan, Computer Science and Engineering Indian Institute of Technology, Kanpur, India (email:email@example.com) Ryutaro Ichise, Principles of Informatics Research Division, National Institute of Informatics, Tokyo, Japan(email:firstname.lastname@example.org)
Cite: Mrinmaya Sachan and Ryutaro Ichise, "Using Semantic Information to Improve Link Prediction Results in Network Datasets," International Journal of Engineering and Technology vol. 2, no. 4, pp. 334-339, 2010.