Homophily and Serendipity

By way of Language Log, I came across the UnSuggester at LibraryThings. I now have a new way to waste away hours of my life on the internet.

The idea is that the user enters in the title of a book, and the UnSuggester will offer up anti-books, which are the polar opposite books. Interesting idea: not just the Amazon suggestions of “hey, other people spent their money on these titles, too,” but “hey, you should really avoid these.” But why bother with that? Well, homophily, that’s why.

Let’s face it: most people like a given something and will stick with other things that are similar. It’s something called homophily, liking the same or similar things. Like Boy Band music? Well, you will probably go for the other Boy Bands too. Like low-fat vegan cookbooks? You probably have more than one, and glance continuously at new titles of low-fat cookbooks that come out. Part of the reason that Hollywood does sequels is that consumers will scoop up similar themes, not just the dearth of creative ideas. Financially, it’s a reasonable bet to film something in a proven creative vein instead of risking money on a new idea that is untested. Thus, Police Academy 17: Morons on Patrol, The Baghdad Edition.

The flip side is that following same or similar ideas makes a barrier for new ideas to enter. It is difficult for something that isn’t a “me too” idea to enter in our lives. We naturally pursue similar things that are enjoyable to us and avoid what we think is similar to things we don’t like. The result is that our tastes, reading materials, television shows, movies all become a homogenized block varying only slightly in degree.

This is not to say that we are all clones of each other culturally, becoming consumers little different from Bubble Gum Blondies of Pop Music. We are in fact different, having different niches that we prefer, but it becomes difficult to find new niches, new ideas. The individual niches can be serviced in the market, more and more with niche marketing as companies pursue the Long Tail of consumer taste, and individually we have different combinations of likes and dislikes. But when we go looking for some new thing, be it a book or movie or album, we tend to stick to similar things since that is the available information to us for decision making. I like this genre or author, so there is a good chance, or at least better chance than random probability, that I will like it. I dislike that sort of music, so there is a better than even chance that I will not like similar albums.

The mixing of dissimilar items for someone’s taste is not the usual state of affairs for most people. John Emerson at Idiocentrism humorously refers to this with his comment:

As I understand it, people who enjoy both a book and its anti-book are monsters of perversity and moral equivalence. Now that the determination of anti-books has been placed on a firm scientific, empirical basis, it’s become much easier to ferret out such loathsome creatures.

The New York Times Magazine Supplement did an article about homophily. In this article, the idea of exposing one to different tastes and ideas that would be outside the normal fare comes up, so creating “serendipity”, which is a proxy for anti-homophily. The LibraryThing blog refers to the UnSuggester making it in the New York Times article, since looking into what would be the “anti-book” for our given favorites would hypothetically be a good way to open us up to new ideas that we would not normally be exposed to.

Serendipity, however, is not “anti-homophily” or even a close proxy, but instead an ingredient to prevent stagnation or homogeneity. Homophily is not necessarily a bad thing, since it is a method that people use to make decisions with incomplete information. We make approximations based on what we know already and extrapolate that to new things. What could be a problem or limiting trend as a result of homophily is homogeneity, which would be the natural outcome of excessive homophily since horizons would never be expanded with newer concepts, ideas, sounds, or whatever. Serendipity gives the needed antidote to homogeneity. Serendipity breaks up the routine and either lets in some fresh air to set us off in a new direction in addition to what we pursued before, or informs us that we really don’t like that new direction either. This becomes additional information that we incorporate with homophily in decision making.

John Emerson does his own bit of investigation on his blog with the UnSuggester looking at homophily and serendipity. He notes some interesting things about the particular results offered up by the Unsuggester:

It seems that some people buy only one kind of book, and these seem to be of three kinds: pulp fiction (e.g. Pratchett), contemporary lifestyle fiction (e.g. Brashares), and Christian books (e.g. John Piper). The Christian books are opposite to the other two categories, but there’s a considerable tension between the lifestyle books and the pulp fiction too. All three categories are the opposites of modernist, decadent, and cynical literature, and the lifestyle and pulp books tend to be opposed to all serious literature of any kind.

Intersesting observation, that some with narrower tastes can upset the smaller sample and so “skew” the results, assuming that this sort of modeling can be taken seriously.

At bookpress, the same observation is made in the comments by Emerson, although this time he makes reference in regard to wildly broad tastes upsetting the sample. For example, “Murakami seems to have a cult whose reading tastes are very peculiar”. If there is too small of a sample available to the UnSuggester, scattershot outliers can be just as upsetting to the results as excessively narrow tastes since not enough data is available to make correlations with any confidence.

How are such models generated? One method that is available for correlating many different things mathematically is Assortative Mixing in graph theory.

Specifically, we are using this technique (I am assuming now, since LibraryThing doesn’t spell it out) looking at how close two books are based on “similar characteristics,” which is the library shelf of the user as a proxy. Interesting take, but I don’t really know enough about graph theory to comment thoroughly on that, or the converse: nodes with the most dissassortative (or least assortative) mixing are the “anti-book” and not likely to be come across in the normal fare of the consumer’s selection.

The unanswered question here, however, is whether the anti-book is merely an item that is not likely to be encountered, or whether it is a prediction of taste, that the consumer would not like the “anti-book.” With assortative mixing, the anti-book would be the one that was not likely to cluster with homophilic choices, so the answer would be one of not normally encountering. There may well be some prediction of distaste, in that encounters with what is non-homophilic in choices would be not enjoyed which would reinforce the homophilic selection. However, it seems to be that by and large, the dissassortative node is not inherently distasteful automatically. It would be a better measure of “not liking” if the consumer had an assortative mixing of what is not to their liking and shy away from nodes displaying similar characteristics.

So let’s give the UnSuggester a run on some titles that I have right here, right now in front of me and have just finished reading. We can get an idea of whether this is recommending something not likely to be encountered, something not likely to be enjoyed, or some combination of both. More realistically, we can get an idea of how the UnSuggester matches up with my reading tastes. First up is Get Shorty, by Elmore Leonard.

The list leads off with books on Christian Spirituality, but by number nine on the list, we have the Harry Potter paperback set. While I don’t have that specific reprinting, I have purchased them, and read them all like a crack addict. Anyone interested in the particulars can ask The Wife what happened with the last novel: we had a competition to see who would finish first, giving each other cagey stares and quick glances at bookmarks before any conversation that could remotely lead to something that would involve the storyline. “Hi, honey. How was your day?” “Don’t tell me what happened! I haven’t gotten to that part!”

Damn Harry Potter, which incidentally is the most widely read/held book in LibraryThing.

Moving right along, we come to number 16, Design Patterns by Erich Gamma, et al. which is in fact on my wish list. As are numbers 19 (Programming Ruby by Dave Thomas), 32 through 34 (Ambient Findability by Peter Morvile which I just added two days ago, Code Complete by Steve McConnell, and A New Kind of Science by Stephen Wolfram) and number 39 (Refactoring by Martin Fowler). Yeah. Not looking too good. We’ll also not mention that I read Built to Last by Jim Collins when I was working for The Fabric Company. Or that I used it in B-School to shake up the occasional professor.

Okay, I’m a programmer by trade, so most of these selections are something I am likely to get in relation to keeping up the job skills. Except for the fact that I like this sort of analytical problem solving, so while I work at a Sun Shop (Java, Solaris, Eclipse, the whole bit), I don’t have a specific need for Ruby. But I am really curious about the hype. And where does Wolfram’s book on the idea of Emergence work into the picture of my reading if not pure intellectual curiosity?

Well, the light reading from Leonard is the red herring then, and I should search for something programmatically themed. Something like Programming Perl by Larry Wall which is also sitting in front of me.

And the list bogs down right off the bat. I have to own up to reading the number one listing, White Oleander by Janet Fitch. In fairness, chicklit is not really my fare, the book was [insert noise here] for my personal taste in fiction, and I know that I will never get those hours back in my life. But it made the time on the train back and forth to work pass. If by UnSuggestion, we mean that I won’t like it, right on that score. But as for the likelihood that I will or will not read it? Well, let’s chalk it up to being an outlier.

Overall, however, this list seems more on the mark. The Sisterhood of the Traveling Pants by Ann Brashares isn’t one that is high on my list of books to read, being right after the list of ingredients for artificial sweetener. I have read The Outsiders by S.E. Hinton at number nine on the list, but that was decades ago in (grumble, grumble) middle school, and all the kids were doing it. Just like methamphetamines, which I wouldn’t recommend either. Same with Little Women at number 19, although that was high school, and, yeah, we’ll just leave it at that where it belongs.

The only other notable exceptions are The Things They Carried by Tim O’Brien and Prozac Nation by Elizabeth Wurtzel. O’Brien’s work is highly recommended by a couple of people around these parts (but that could be ascribed to the wartime setting of both the book and my current residence) and I own Wurtzel’s book. It’s somewhere in the house, I think. As I recall, I enjoyed it too, although I liked Postcards From the Edge better.

Interestingly, there seems to be a lot of Janet Evanovich novels on the list, all of which are right now sitting in the TOC courtesy of some care package. They didn’t really appeal to me before, and now, based on the eerie coincidence of seeing them on the UnSuggester, I will continue to ignore them.

In the interest of full disclosure, however, my reading tastes tend to be eclectic and so will naturally disrupt any homogeneity. The same goes with my musical tastes. So, I may not be the best test case for something like this, and in any event I am a sample size of one. But it does give some insight to what I might otherwise be missing, and is a starting point for looking in new directions.

On a lark, one of the side links labeled “You will not like!” is Ella Enchanted by Gail Carson Levine. Now, I can’t claim to have read the book, but having children running amok in the domicile, I have seen the movie, and it wasn’t insufferable. But the reading list for UnSuggestions for Ella Enchantedis incredible. I estimate that I have read or have on a wish list nearly one fifth (14/73) of the titles. Now, that is serendipity.

You know, The Devil Wears Prada by Lauren Weisberger keeps coming up in the lists of UnSuggestions. Maybe I should take a look…

Leave a Reply