ladamic's blog research on information networks and non-researchy random musings


Thanksgiving ingredient network leftovers

Filed under: data, statistics, visualization,papers — Tags: , , — ladamic @ 03:29

Michaeleen Doucleff just wrote a very fun article on our recipe network paper for NPR’s the Salt.

It made me realize that Edwin Teng, Yuru Lin and I have some leftover plots that may be Thanksgiving appropriate. If you don’t have quite the right ingredients handy while cooking Thanksgiving dinner, here is a network of common substitutions as found in reviewers’ comments on a large recipe site (click to see a larger view):

The favorite Thanksgiving ingredients are often recommended as substitutes. e.g. cranberries end up substituting for other kinds of fruits and even somehow for chocolate. In the fat category, olive oil and butter seem to be recommended as substitutes for things such as margarine. Yams are often recommended as a substitute for sweet potatoes (more so than the other way around), etc.

Recently, from my Coursera class, I created region specific networks using data shared by YY Ahn & co. in their flavor network paper. This isn’t a complete set of all regions, but see if you can guess which region is visualized in each of these (mouse over for the answer, your choices are Northern European, Southern European, North American, Latin American, Middle Eastern, South Asian, African, Southeast Asian, East Asian):


Lastly, and most deliciously, here is the network of complementary ingredients for Thanksgiving, created by my collaborator Edwin Teng:

Bon Apetit!


Recipe recommendation using ingredient networks

In cooking I alternate between following recipes exactly, for fear that any sort of deviation might ruin the outcome, and trying to throw things together arbitrarily, with occasionally edible results. Could this problem be solved the way I like to approach other problems, i.e. by analyzing a nice data set, preferably of user contributed knowledge?

So a little over a year ago, I proposed the idea of using ingredient networks to evaluate recipes at a “Wacky Wednesday” faculty meeting, where School of Information faculty gather and pitch ideas to each other. The mix of interest and skepticism with which the idea was greeted was enough to motivate me to work on the problem with my PhD student Edwin Teng. Soon thereafter, Yu-Ru Lin, from Northeastern and Harvard, joined us on the project, and lent it her insight and machine learning expertise.

A lot of fun findings ensued (you can download the paper on arxiv):

1) If one examines complementary ingredients, two main communities fall out, one sweet, the other savory (see image above).

And there is a smaller, third community of ingredients for mixed-drinks.

mixed drink ingredients

2) Recipe reviews are a goldmine of data. There are ample suggestions for modifications (additions, deletions, increases, decreases, substitutions). These could be used to create “flexible” recipes, suggesting a range for the quantity of an ingredient, and possible substitutes. In fact, a substitute network reveals global communities of interchangeable ingredients.

3) Ingredient networks can be used to predict recipe ratings. “These networks encode which ingredients go well together, and which can be substituted to obtain superior results, and permit one to predict, given a pair of related recipes, which one will be more highly rated by users.” It appears that the substitute network in particular encodes nutrition information, e.g. users’ preferences for “healthier” variants for a recipe.

4) The hypothesis presented in Catching Fire, that humans have evolved to prefer cooking methods that extract more energy value from food, is consistent with recipe ratings. Recipes that call for heating (baking, boiling, grilling), are rated on average more highly than those that only call for mechanical preparation methods (chopping, mixing). Chemical methods (marinating & brining) give a slight additional boost.

5) US regional preferences are easily discernable, e.g. frying being popular in the south, and grilling being popular on the west coast and in the mountain regions. It would be interesting to study how these are affected by the availability of ingredients and cultural influences.

Also, stay tuned for some fantastic related work by YY Ahn, Sebastian Ahnert, James Bagrow and Laszlo Barabasi, getting to the bottom of recipe preferences by analyzing networks of flavor compounds in food pairings.

Finally, a short thanks for some of the tools we used:

Gephi for visualizing the networks
Map generator for detecting communities, here are two examples:


preventing hospital to hospital infection spread

Filed under: data, statistics, visualization — ladamic @ 20:29

For the past few months I’ve been collaborating with Jack Iwashyna, Assistant Professor at UofM’s medical school and SI MSI student Umanka Hebbar Karkada. Jack had a fun idea, and a fun data set – hospital to hospital patient transfers, mined from medicare claims. These transfers are a way for highly resistant infections to jump from one critical care unit to another. Mostly hospitals devote resources to preventing infection spread separately from one another.
We posed the question of how resources could be allocated in a coordinated way to maximally stem the spread of infection. Umanka and I tried several stategies – targeting hospitals with the highest degree (number of hospitals they trade patients with), highest betweenness (they are on the “path” between other hospitals), and a greedy allocation based on the number of beds infected at each hospital and downstream from that hospital.
The results are here. Both figures show hospitals as nodes sized by the number of ICU beds they have.
This shows the number of resources allocated by hospital (gray = none, blue = few, red = many).

This shows the relative benefit of a random allocation vs. targeting particular hospitals. (blue = hospital unlikely to become infected, red = likely to be infected)


My first motion chart

Filed under: data, statistics, visualization — ladamic @ 01:13

I’ve created my first motion chart using my dad’s data. My dad, Kresimir Adamic is into tennis and statistics. Matt Simmons, a student in my Fall ’08 SI 601 class, demoed Google’s motion chart gadget and I’ve been wanting to try it out ever since.

And if you haven’t seen Hans Rosling’s TED talk on using motion charts to show the relationship between health care, prosperity and birth rates, you’re missing out.

Powered by WordPress