As I listen to Laszlo Barabasi present some new work on predicting social networks from co-location data, I hear that the “Adamic/Adar coefficient” can predict new edges well based on social network data alone. It’s remarkable that it has held up over the years given how it was originally thought up. Eytan Adar and I wanted to be able to predict which homepages would link to each other based on other pages (and items) that they shared. I vaguely knew about the existence of TFIDF-based similarity measures, and that they involved a logarithm, but I didn’t actually look up the definition. Instead, I summed all the common items z, weighted by 1/log(popularity(z)), which roughly captured the idea. The more common items, and the less widespread the items are, the higher likelihood that two individuals know each other. I don’t know that anyone has actually tried a true TFIDF weighted cosine similarity measure, it might actually be better. But thanks to a generous treatment by David Liben-Nowell and Jon Kleinberg, who named our metric and compared it against others, the “Adamic/Adar” coefficient has since taken on a life of its own.
A much more prominent and consequential rediscovery/reformulation, that of PageRank, may also have benefitted from an apparent lack of awareness of prior work. Whenever PageRank is applied to key problems, whether it is identifying species in food webs whose removal would unravel them, or just plain webpage ranking, if Stanley Wasserman is in the audience, he’ll point out that eigenvector centrality had previously been known for decades in the community of social network researchers. However, PageRank is so very simple to implement. The damping factor and heuristics resolve any issues with disconnected components and dead ends, and you’re set. And I wonder how quickly and broadly eigenvector centrality would have been adopted if it wasn’t for this modification, if Page and Brin had read Bonacich and implemented his measure instead.
This may just be the old story of innovation as opposed to exploitation. When we overly rely on (exploit) others’ prior work, we may be under-innovating in areas where we believe everything has already been done. If we under-exploit others’ work, we may be wasting time exploring what is already known. Especially given the explosion of research in my area, I can take a some comfort in knowing that a bit of ignorance may be beneficial.
Well, in my opinion, if you rediscover something without knowing about prior work – it’s still your discovery, despite lack of credits for that from others. The pleasure of finding something out, even if it’s just rediscovering, gives me also a lot of fun 🙂
Thanks for linking to the “Friends and neighbors on the Web” – I haven’t known it so far.
Today I got to give a talk in which I was able to highlight that a problem that many people thought was closed still had some interesting mysteries in it. (We don’t have the answer, but out paper does help illuminate the mystery. The original goal of the paper was to move beyond “known” results, but we ended up having to back up when we realized that certain that something deeper was actually still lurking in the well-understood stuff.)
Great post! Im reminded of the following excerpt from Feynman that I came across recently: http://www.physics.ohio-state.edu/~kilcup/262/feynman.html
Over-reliance on others’ prior work includes the often unrecognized side effect of being over-reliant on others’ prior assumptions, which may overly bias investigators who come upon a new area regarding the ways they may think about problems and solutions.
I believe a healthy dose of ignorance or naivete is required for success in nearly any entrepreneurial endeavor.