SKEDSOFT

Data Mining & Data Warehousing

Introduction: Viral marketing is an application of social network mining that explores how individuals can influence the buying behavior of others. Traditionally, companies have employed direct marketing (where the decision to market to a particular individual is based solely on her characteristics) or mass marketing (where individuals are targeted based on the population segment to which they belong).

These approaches, however, neglect the influence that customers can have on the purchasing decisions of others. For example, consider a person who decides to see a particular movie and persuades a group of friends to see the same film. Viral marketing aims to optimize the positive word-of-mouth effect among customers. It can choose to spend more money marketing to an individual if that person has many social connections. Thus, by considering the interactions between customers, viral marketing may obtain higher profits than traditional marketing, which ignores such interactions.

The growth of the Internet over the past two decades has led to the availability of many social networks that can be mined for the purposes of viral marketing. Examples include e-mail mailing lists, UseNet groups, on-line forums, instant relay chat (IRC), instant messaging, collaborative filtering systems, and knowledge-sharing sites. Knowledge sharing sites allow users to offer advice or rate products to help others, typically for free. Users can rate the usefulness or “trustworthiness” of a review, and may possibly rate other reviewers as well. In this way, a network of trust relationships between users (known as a “web of trust”) evolves, representing a social network for mining.

The network value of a customer is the expected increase in sales to others that results from marketing to that customer. In the example given, if our customer convinces others to see a certain movie, then the movie studio is justified in spending more money on promoting the film to her. If, instead, our customer typically listens to others when deciding what movie to see, then marketing spent on her may be a waste of resources. Viral marketing considers a customer’s network value. Ideally, we would like to mine a customer’s network (e.g., of friends and relatives) to predict how probable she is to buy a certain product based not only on the characteristics of the customer, but also on the influence of the customer’s neighbors in the network. If we market to a particular set of customers then, through viral marketing, we may query the expected profit from the entire network, after the influence of those customers has propagated throughout. This would allow us to search for the optimal set of customers to which to market. Considering the network value of customers (which is overlooked by traditional direct marketing), this may result in an improved marketing plan.

Given a set of n potential customers, let Xi be a Boolean variable that is set to 1 if customer i purchases the product being marketed and 0 otherwise. The neighbors of Xi are the customers who directly influence Xi. Mi is defined as the marketing action that is taken for customer i. Mi could be Boolean (such as, set to 1 if the customer is sent a coupon, and 0 otherwise) or categoric (indicating which of several possible actions is taken). Alternatively, Mi may be continuous-valued (indicating the size of the discount offered, for example). We would like to find the marketing plan that maximizes profit. A probabilistic model was proposed that optimizes Mi as a continuous value. That is, it optimizes the amount of marketing money spent on each customer, rather than just making a binary decision on whether to target the customer.

The model considers the following factors that influence a customer’s network value. First, the customer should have high connectivity in the network and also give the product a good rating. If a highly-connected customer gives a negative review, her network value can be negative, in which case, marketing to her is not recommended. Second, the customer should have more influence on others (preferably, much more) than they have on her. Third, the recursive nature of this word-of-mouth type of influence should be considered. A customer may influence acquaintances, which in turn, may like the product and influence other people, and so on, until the whole network is reached. The model also incorporates another important consideration: it may pay to lose money on some customers if they are influential enough in a positive way. For example, giving a product for free to a well-selected customer may pay off many times in sales to other customers. This is a big twist from traditional direct marketing, which will only offer a customer a discount if the expected profits from the customer alone exceed the cost of the offer. The model takes into consideration the fact that we have only partial knowledge of the network and that gathering such knowledge can have an associated cost.

The task of finding the optimal set of customers is formalized as a well-defined optimization problem: find the set of customers that maximizes the net profits. This problem is known to be NP-hard (intractable); however, it can be approximated within 63% of the optimal using a simple hill-climbing search procedure. Customers are added to the set as long as this improves overall profit. The method was found to be robust in the presence of incomplete knowledge of the network. Viral marketing techniques may be applied to other areas that require a large social outcome with only limited resources.