As I listen to Laszlo Barabasi present some new work on predicting social networks from co-location data, I hear that the “Adamic/Adar coefficient” can predict new edges well based on social network data alone. It’s remarkable that it has held up over the years given how it was originally thought up. Eytan Adar and I wanted to be able to predict which homepages would link to each other based on other pages (and items) that they shared. I vaguely knew about the existence of TFIDF-based similarity measures, and that they involved a logarithm, but I didn’t actually look up the definition. Instead, I summed all the common items z, weighted by 1/log(popularity(z)), which roughly captured the idea. The more common items, and the less widespread the items are, the higher likelihood that two individuals know each other. I don’t know that anyone has actually tried a true TFIDF weighted cosine similarity measure, it might actually be better. But thanks to a generous treatment by David Liben-Nowell and Jon Kleinberg, who named our metric and compared it against others, the “Adamic/Adar” coefficient has since taken on a life of its own.
A much more prominent and consequential rediscovery/reformulation, that of PageRank, may also have benefitted from an apparent lack of awareness of prior work. Whenever PageRank is applied to key problems, whether it is identifying species in food webs whose removal would unravel them, or just plain webpage ranking, if Stanley Wasserman is in the audience, he’ll point out that eigenvector centrality had previously been known for decades in the community of social network researchers. However, PageRank is so very simple to implement. The damping factor and heuristics resolve any issues with disconnected components and dead ends, and you’re set. And I wonder how quickly and broadly eigenvector centrality would have been adopted if it wasn’t for this modification, if Page and Brin had read Bonacich and implemented his measure instead.
This may just be the old story of innovation as opposed to exploitation. When we overly rely on (exploit) others’ prior work, we may be under-innovating in areas where we believe everything has already been done. If we under-exploit others’ work, we may be wasting time exploring what is already known. Especially given the explosion of research in my area, I can take a some comfort in knowing that a bit of ignorance may be beneficial.