Amazon, Oprah, and The Long Tail

I noticed something new, or at least new to me since I live in a bubble right now, on Amazon.com today. There is some statistical analysis of the books that they sell, provided the books are in the Search Insideâ„¢ program.

Take for example this book. Clicking on the Text Stats link at the top next to the Explore tag, or scrolling down to the Inside This Book section, brings the buyer to some fun facts about the book. Such as the complexity of the book, defined in a number of ways from Fog Index to the number of words with three or more syllables. Now, this is potentially useful information to a potential buyer, say a proactive parent looking to see if the book is suitable for a given reading level. Granted, it is only a piece of information, and tells us nothing of the information content, only the words used.

[Ed. Note: The choice of book for this example should not be taken as anything more than my whacked out sense of humor for what makes a good title. The Wife and I are doing fine, and I couldn’t be happier. Well, I could if I was actually home with her, but you get the idea.]

A couple of things struck me about this, particularly since the Inside This Book section was immediately followed by “What do customers ultimately buy after viewing items like this?” and a list of books with percentage rankings of what books were bought after this page view. Savvy marketing, giving information like this. The problem, though, is that the information is misleading.

While the word counts might be correct, and the glossy word frequency map (called Concordance by Amazon) is so cute I could hug it, the data given is not in context, and the information is therefore absent. Okay, the Fog Index is 40 bazillion. What exactly does that mean? The simple definition given by Amazon is that “[i]t indicates the number of years of formal education required to read and understand the passage.” Great, what does that mean? And how do we arrive at that? A simple search on Wikipedia leads to more information.

Never mind that not all formal schooling is created equal. Go ask any public school teacher or involved parent. But I suppose it is a start, although it seems to me that not enough information about the readability indicators is provided to anyone who would find value in this data.

The other item that I noted, the statistical percentage of buying a given book after this, is also misleading. The thing about statistics, and probability, is that there is no memory. Supposing I take a list of books that were bought, and we go through the whole bit about making sure that it is a random sample I am taking and that my sample size is high enough, we can come up with a number that says there is an X chance of the following behavior that can be expected. Nice. And I should expect that in the future, since what actions happen now does not affect the future actions since there is no memory.

Except that this is more of a systems analysis than a statistical probability. People are thinking agents, so we get a system that shows Complexity rather than probability. Essentially, what happens is that the buyer is a thinking agent, or better yet, is an agent that responds to input from the surrounding environment. And now, we have memory in the system, so probability is not the best indicator since it will change over time.

A buyer will see that “a lot” of people buy this given book. Therefore, it is either good or otherwise worth buying. Everybody else is doing it, why don’t you? Now, the buyer has a chance X of buying it, does, and skews the probability. Which influences the next buyer, further skewing the numbers. This is a positive reinforcement cycle. And to boot, there are additional constraints on the buyer complicating the neat arrangement, such as funds available, preferences about authors, what Oprah recommended that month, etc.

All this makes it nice marketing. A nice number is put up, the system is influenced, and voila! A bestseller is born. The influence of Oprah with her book club generates such power that publishers fall over themselves to land a given book on Oprah’s list. The numbers put up by Amazon also influence sales systemically, although probably not to the degree that Oprah does. But the buyer is also not likely to be educated about what the numbers mean, and likely to lapse into the natural human response of dealing with information that is not understood right away: smile and nod. To do differently is to admit ignorance, and that is something people seem loathe to admit.

I have to wonder what Phillip K. Dick would make of all of this. Marketing influencing buying behaviors subtly, using the power of computers to crunch a lot numbers fast over the past purchases of a lot of people, has that potential tint of forcing a product on the market that might not otherwise be supported. This theme was also the subject of James Tiptree’s short story ”The Girl Who Was Plugged In. Naturally there is the potential for abuse, in the sense of graft and corruption to get a product in the key spots for mass marketing. We assume that Amazon’s numbers are impartial. Or that Oprah’s staff is on the level. And if these assumptions were not valid?

The counter balance for this is what has been come to be known as The Long Tail. Simply put, in a power law distribution, there will be a few that have a lot of whatever it is that is being measured. Most will have little. This is exactly what happens in wealth distribution in an economy, so we get tidbits like “80% of the wealth is held by 20% of the population” to make something up. The real kicker is, if we look closer, we can get something like “40% of whatever is held by 3%,” to continue the use of our example. This means that 60% is held by the remainder, which is a solid majority at 97% of whatever population we are sampling.

Let’s go back to books. Suppose that 40% of sales are held by the top 100 books, and the top five publishers. Now, there are many more books and publishers, but they appear to be frozen out of the market, garnering small sales and market shares. The glass ceiling has been reached, and cracking that top portion of the market seems just out of reach. Or is it? Combined, all the other publishers and books are the majority of sales, 60% in our example.

So the key is to combine all the niche markets that a given publisher reaches. This is the way to survive the seeming market lock and even grow in size. If you are a small publisher, and you do a few small runs, you have fixed and variable costs. The fixed costs are setting up for the runs, which are a one time costs and pretty heavy. The variable costs are the ink and paper, which scale with the size of the run. So, classically, if you make a large run, or a run of a certain minimum size, you dilute the cost of the setup. Otherwise, the cost of setup makes the run an economic loser.

But if you reduce the setup cost, or eliminated it entirely, then you are playing on variable costs alone, and the small runs are now worthwhile. Living in the Long Tail then, your book publishing company can do many small runs of obscure books. Granted, none is going to be Earth shattering in the volume of sales, but combined together, they will rival the top players.

So, Phillip K. Dick need not roll over in his grave. There is an offset to the glass ceiling and break out of the bottom of the system. Or rather, what appears to be the top isn’t, and it is just a matter of perception as well as learning to cope with what really is the case.

Leave a Reply