Social Computing

Once upon a time, I had a plug-in for Winamp that did something I thought was insanely cool. While I listened to the music, it would make a webpage of the last bunch of songs that I had finished listening to that was nice and neat. Periodically, I could update my website with this page, and had a link that said something to the effect of “click here to check out what I am listening to.” Once the user clicked, they were taken to the oh-so-cute page of music listings. It was one little way that I could say to the world, or whomever was at my site, “hey, check out some of what I have! Maybe some of it will be interesting to you!”

Continue reading

Homophily and Serendipity

By way of Language Log, I came across the UnSuggester at LibraryThings. I now have a new way to waste away hours of my life on the internet.

The idea is that the user enters in the title of a book, and the UnSuggester will offer up anti-books, which are the polar opposite books. Interesting idea: not just the Amazon suggestions of “hey, other people spent their money on these titles, too,” but “hey, you should really avoid these.” But why bother with that? Well, homophily, that’s why.

Continue reading

Amazon, Oprah, and The Long Tail

I noticed something new, or at least new to me since I live in a bubble right now, on Amazon.com today. There is some statistical analysis of the books that they sell, provided the books are in the Search Insideâ„¢ program.

Take for example this book. Clicking on the Text Stats link at the top next to the Explore tag, or scrolling down to the Inside This Book section, brings the buyer to some fun facts about the book. Such as the complexity of the book, defined in a number of ways from Fog Index to the number of words with three or more syllables. Now, this is potentially useful information to a potential buyer, say a proactive parent looking to see if the book is suitable for a given reading level. Granted, it is only a piece of information, and tells us nothing of the information content, only the words used.

[Ed. Note: The choice of book for this example should not be taken as anything more than my whacked out sense of humor for what makes a good title. The Wife and I are doing fine, and I couldn’t be happier. Well, I could if I was actually home with her, but you get the idea.]

A couple of things struck me about this, particularly since the Inside This Book section was immediately followed by “What do customers ultimately buy after viewing items like this?” and a list of books with percentage rankings of what books were bought after this page view. Savvy marketing, giving information like this. The problem, though, is that the information is misleading.

While the word counts might be correct, and the glossy word frequency map (called Concordance by Amazon) is so cute I could hug it, the data given is not in context, and the information is therefore absent. Okay, the Fog Index is 40 bazillion. What exactly does that mean? The simple definition given by Amazon is that “[i]t indicates the number of years of formal education required to read and understand the passage.” Great, what does that mean? And how do we arrive at that? A simple search on Wikipedia leads to more information.

Never mind that not all formal schooling is created equal. Go ask any public school teacher or involved parent. But I suppose it is a start, although it seems to me that not enough information about the readability indicators is provided to anyone who would find value in this data.

The other item that I noted, the statistical percentage of buying a given book after this, is also misleading. The thing about statistics, and probability, is that there is no memory. Supposing I take a list of books that were bought, and we go through the whole bit about making sure that it is a random sample I am taking and that my sample size is high enough, we can come up with a number that says there is an X chance of the following behavior that can be expected. Nice. And I should expect that in the future, since what actions happen now does not affect the future actions since there is no memory.

Except that this is more of a systems analysis than a statistical probability. People are thinking agents, so we get a system that shows Complexity rather than probability. Essentially, what happens is that the buyer is a thinking agent, or better yet, is an agent that responds to input from the surrounding environment. And now, we have memory in the system, so probability is not the best indicator since it will change over time.

A buyer will see that “a lot” of people buy this given book. Therefore, it is either good or otherwise worth buying. Everybody else is doing it, why don’t you? Now, the buyer has a chance X of buying it, does, and skews the probability. Which influences the next buyer, further skewing the numbers. This is a positive reinforcement cycle. And to boot, there are additional constraints on the buyer complicating the neat arrangement, such as funds available, preferences about authors, what Oprah recommended that month, etc.

All this makes it nice marketing. A nice number is put up, the system is influenced, and voila! A bestseller is born. The influence of Oprah with her book club generates such power that publishers fall over themselves to land a given book on Oprah’s list. The numbers put up by Amazon also influence sales systemically, although probably not to the degree that Oprah does. But the buyer is also not likely to be educated about what the numbers mean, and likely to lapse into the natural human response of dealing with information that is not understood right away: smile and nod. To do differently is to admit ignorance, and that is something people seem loathe to admit.

I have to wonder what Phillip K. Dick would make of all of this. Marketing influencing buying behaviors subtly, using the power of computers to crunch a lot numbers fast over the past purchases of a lot of people, has that potential tint of forcing a product on the market that might not otherwise be supported. This theme was also the subject of James Tiptree’s short story ”The Girl Who Was Plugged In. Naturally there is the potential for abuse, in the sense of graft and corruption to get a product in the key spots for mass marketing. We assume that Amazon’s numbers are impartial. Or that Oprah’s staff is on the level. And if these assumptions were not valid?

The counter balance for this is what has been come to be known as The Long Tail. Simply put, in a power law distribution, there will be a few that have a lot of whatever it is that is being measured. Most will have little. This is exactly what happens in wealth distribution in an economy, so we get tidbits like “80% of the wealth is held by 20% of the population” to make something up. The real kicker is, if we look closer, we can get something like “40% of whatever is held by 3%,” to continue the use of our example. This means that 60% is held by the remainder, which is a solid majority at 97% of whatever population we are sampling.

Let’s go back to books. Suppose that 40% of sales are held by the top 100 books, and the top five publishers. Now, there are many more books and publishers, but they appear to be frozen out of the market, garnering small sales and market shares. The glass ceiling has been reached, and cracking that top portion of the market seems just out of reach. Or is it? Combined, all the other publishers and books are the majority of sales, 60% in our example.

So the key is to combine all the niche markets that a given publisher reaches. This is the way to survive the seeming market lock and even grow in size. If you are a small publisher, and you do a few small runs, you have fixed and variable costs. The fixed costs are setting up for the runs, which are a one time costs and pretty heavy. The variable costs are the ink and paper, which scale with the size of the run. So, classically, if you make a large run, or a run of a certain minimum size, you dilute the cost of the setup. Otherwise, the cost of setup makes the run an economic loser.

But if you reduce the setup cost, or eliminated it entirely, then you are playing on variable costs alone, and the small runs are now worthwhile. Living in the Long Tail then, your book publishing company can do many small runs of obscure books. Granted, none is going to be Earth shattering in the volume of sales, but combined together, they will rival the top players.

So, Phillip K. Dick need not roll over in his grave. There is an offset to the glass ceiling and break out of the bottom of the system. Or rather, what appears to be the top isn’t, and it is just a matter of perception as well as learning to cope with what really is the case.

Information Operations

One of the buzz phrases or buzz concepts that the United States Military is currently obsessing about is one with the moniker of Information Operations. This is something that we here on the ground in Iraq, or more specifically my unit (a combat unit), got in full force once we touched down in Kuwait prior to our arrival at the current base of operations. Looking back, there was an inkling of it at one of the final briefings that we had stateside.

At this particular briefing, we got an introduction to what Iraq was going to be like from some returning veterans. They talked of the operations that they conducted, what the locals were like, particulars of equipment, strategy, tactics, and the like. Near the end, some staffer from one of the higher headquarters asked how much Information Operations played into what they did on the ground.

Dead silence. The baffled looks from the veterans said it all. They had no idea what was being talked about, let alone practicing this idea. None of us in the audience, except for the staffer, had any idea either.

Once in Kuwait, however, we got an introduction to what that staffer was talking about. Right off the bat, however, the lecturer discredited the idea and himself to my mind. Asking about the definition of what Information Operations could be, he comes to the brilliant conclusion that there is no right definition. What he failed to see, though, was that means that there is no definition, and what cannot be defined cannot be studied in rigor or in a scientific manner. Presenting a theory and resulting doctrine of an indefinable is just downright silly.

He presents a picture with no caption, and asks us what we think is going on. This is driving at the idea he is fumbling with, that is that the media plays a role in what we do, that the images and sound bites control the public opinion and the political masters that pull our leashes.

Finally, we settle on calling Information Operations the act of delivering a message to a designated target to achieve a desired effect. Now we are in the business of advertising? Or spin control? To be perfectly honest, a mortar barrage will do the same thing of delivering a targeted message, only the message is something along the lines of “Tag! You’re it,” “Better duck,” or “You’re dead.”

I love the obsession that American management types have with padding more words into a description in an attempt to hide the fact that they do not have well formed thoughts about a particular concept. Why qualify the target as designated? As opposed to undesignated targets?

Realistically, this could actually be a good conception, of entering a competition of ideas. The vast majority of wars up until the 20th century have been wars on a smaller scale, what is now called Limited Intensity Conflicts. Wars of annihilation, like World War II, are actually the exception. John Nagle in his book Learning to Eat Soup with a Knife explores this distinction, particularly with respect to counter-insurgency by the British and Americans in South East Asia. Realizing that our conception of what defines a war is skewed is something that I argue with my comrades about. To them, this is not a real war. To me, it is, and one more in line with what war is the vast majority of the time. But, as there are no trenches or masses of tanks with the skies darkened by aircraft, this must not be “real.”

This is a subtle and key distinction in the perception of war, and how we prosecute it. War is supposed to be quick, violent, and done with before the next celebrity court case comes on the television in the minds of many people. So, as the war progresses into the fourth, fifth, tenth year, the expectations of what it is supposed to be, or more importantly what it is supposed to not be, are not met and the public’s opinion begins to sink into negativity.

In the perception of the proponents of Information Operations, the insurgency is already doing this vaguely defined thing of Information Operations right now. The play on the media in getting out their messages, the influencing of public opinions, and the resulting political pressure that hampers the ability to prosecute the war effectively. This too is flawed. The insurgency is doing no such thing. Media outlets are naturally friendly to the underdogs, or actively hostile to the interests of the Western Governments. In addition, the lack of met expectations with what a war is supposed to look like makes any continued operations a strong negative, or even indistinguishable from a defeat. The only thing that the insurgency has to do is continue on to make the counter-insurgency appear to fail.

There is no concerted effort by “the insurgency” to engage in a media campaign because the insurgency is fragmented and composed of competing interests. This involves a great deal of sectarian violence as this competition is worked out at the business end of a barrel. There is no Middle Eastern version of General Giap that is a unifying and driving force. Al-Zaraqawi was possibly an influential force, but not nearly a unifying one, nor even the most relevant one prior to his death. In fact, the big success of al-Qaeda is that as an organization, it is able to help one group network with other groups that it would otherwise not be able to contact. This cross-pollination of competing groups may make each interest do more that it would on its own, but we are still left with separate competitive elements. The illusion of the media campaign is because of the media’s inclination to favor that particular element, as in the doctoring of the photos by a Reuter’s photographer, not that there is a concerted effort to present a unified manifesto.

In isolation, each group might be seen to deliver a unified manifesto over the area that it operates, but this is a small cell working in relative isolation. That is more along a political movement a la the likes Mao. The small scale local terrorism, the idea that “power flows from a gun” in enforcing the power structure and mobilizing the population base to support the local politics, these would all be familiar to the leader of the Long March. Local politics and sectarian violence are more Maoist in action and implementation than anything else, the pursuit of security through the elimination of any potential rivals. Last one standing is pretty secure, since everyone else is dead.

From this perspective, the conception that the officer was pushing about Information Operations is more political than anything else. I would posit that American frankly suck as politicians, since we have failed time and time again to understand what it is that motivated the rest of world and drove successful revolutions. Marx, Mao, and Deng: they had a conception of what politics was on a grand scale. They were also masters of violence to enforce the political will of the government. Americans are content with the media frenzy resulting from mudslinging and vague allegations of impropriety, not with the larger ideas that drive movements and color world views. But, to their credit, at least Americans are not fans of settling intellectual disputes with gunfire in the streets and mines in the roadways.

Our officer in the briefing regaled us with ideas of how to deliver the targeted message to get the desired effects. The idea was to build relationships, spheres of influence, and then manage, or at least be aware of, second and third order effects. Great talk, but little to back it up with. On the idea of building relationships, there is the fact that we are there for a finite time, whereas whomever we are engaging has to live there. The second and third order effects will be the results of the local history and existing social relationships. Our waltzing in will have little effect on that, other than to throw a new element into the existing system, possibly causing mistrust, or the perception of favoritism. So, the desire not to pick sides in sectarian issues is derailed by the perception that the Americans are helping one group over another, drawing the forces into local dealings whether or not they want to be. The local leaders engaged will know this, since they have had time to perfect their dealings with the previous commanders of forces in the area. Right of, the sphere of potential influence has been minimized, since the American will leave after his tour is up, to be replaced by another starting from zero.

One of the comments that was posed in the middle of this was that we should learn Arabic. Why? Not just to make an attempt to learn some of the local language and reach out in an effort to build relationships, but to make sure that the interpreters are not double dealing. While it is a well known thing that there are some interpreters that play both sides, this is completely unrealistic. Learning a language, particularly one that is a completely different language family than English, is a difficult proposition. Something that is likely to consume more than a year to master, especially the subtleties needed to pick up on mistranslations or plays on words. It would be easier to rely on others squealing on the dirty players, which is another enigmatic can of worms of varying motives and relationships. To simply state that we should pick up another language like learning to read a map or calling for artillery fire is a gross underestimate of what is involved in building relationships or managing expectations.

There are some good nuggets in this conception, the idea that this is a system of relationships and competing interests. But the current proponents of this dogma are sidetracked with the focus on what CNN is putting on the airwaves. Taking a moment to think carefully about what the situation really is and how to approach it intelligently would pay a lot more dividends, instead of approaching this like a greedy investor to an Enron balance sheet.