ladamic's blog research on information networks and non-researchy random musings


Thanksgiving ingredient network leftovers

Filed under: data, statistics, visualization,papers — Tags: , , — ladamic @ 03:29

Michaeleen Doucleff just wrote a very fun article on our recipe network paper for NPR’s the Salt.

It made me realize that Edwin Teng, Yuru Lin and I have some leftover plots that may be Thanksgiving appropriate. If you don’t have quite the right ingredients handy while cooking Thanksgiving dinner, here is a network of common substitutions as found in reviewers’ comments on a large recipe site (click to see a larger view):

The favorite Thanksgiving ingredients are often recommended as substitutes. e.g. cranberries end up substituting for other kinds of fruits and even somehow for chocolate. In the fat category, olive oil and butter seem to be recommended as substitutes for things such as margarine. Yams are often recommended as a substitute for sweet potatoes (more so than the other way around), etc.

Recently, from my Coursera class, I created region specific networks using data shared by YY Ahn & co. in their flavor network paper. This isn’t a complete set of all regions, but see if you can guess which region is visualized in each of these (mouse over for the answer, your choices are Northern European, Southern European, North American, Latin American, Middle Eastern, South Asian, African, Southeast Asian, East Asian):


Lastly, and most deliciously, here is the network of complementary ingredients for Thanksgiving, created by my collaborator Edwin Teng:

Bon Apetit!


Recipe recommendation using ingredient networks

In cooking I alternate between following recipes exactly, for fear that any sort of deviation might ruin the outcome, and trying to throw things together arbitrarily, with occasionally edible results. Could this problem be solved the way I like to approach other problems, i.e. by analyzing a nice data set, preferably of user contributed knowledge?

So a little over a year ago, I proposed the idea of using ingredient networks to evaluate recipes at a “Wacky Wednesday” faculty meeting, where School of Information faculty gather and pitch ideas to each other. The mix of interest and skepticism with which the idea was greeted was enough to motivate me to work on the problem with my PhD student Edwin Teng. Soon thereafter, Yu-Ru Lin, from Northeastern and Harvard, joined us on the project, and lent it her insight and machine learning expertise.

A lot of fun findings ensued (you can download the paper on arxiv):

1) If one examines complementary ingredients, two main communities fall out, one sweet, the other savory (see image above).

And there is a smaller, third community of ingredients for mixed-drinks.

mixed drink ingredients

2) Recipe reviews are a goldmine of data. There are ample suggestions for modifications (additions, deletions, increases, decreases, substitutions). These could be used to create “flexible” recipes, suggesting a range for the quantity of an ingredient, and possible substitutes. In fact, a substitute network reveals global communities of interchangeable ingredients.

3) Ingredient networks can be used to predict recipe ratings. “These networks encode which ingredients go well together, and which can be substituted to obtain superior results, and permit one to predict, given a pair of related recipes, which one will be more highly rated by users.” It appears that the substitute network in particular encodes nutrition information, e.g. users’ preferences for “healthier” variants for a recipe.

4) The hypothesis presented in Catching Fire, that humans have evolved to prefer cooking methods that extract more energy value from food, is consistent with recipe ratings. Recipes that call for heating (baking, boiling, grilling), are rated on average more highly than those that only call for mechanical preparation methods (chopping, mixing). Chemical methods (marinating & brining) give a slight additional boost.

5) US regional preferences are easily discernable, e.g. frying being popular in the south, and grilling being popular on the west coast and in the mountain regions. It would be interesting to study how these are affected by the availability of ingredients and cultural influences.

Also, stay tuned for some fantastic related work by YY Ahn, Sebastian Ahnert, James Bagrow and Laszlo Barabasi, getting to the bottom of recipe preferences by analyzing networks of flavor compounds in food pairings.

Finally, a short thanks for some of the tools we used:

Gephi for visualizing the networks
Map generator for detecting communities, here are two examples:

Powered by WordPress