Tuesday, May 08, 2012

Big Data Public Relations

Until recently it was considered that the amount of information available for any major discipline was reasonably finite.

In addition, there was a belief that optimising output from production plant and people contributed to growth.

These are, we now know, myths. 

Information is not finite. We are growing information at an amazing rate. 

  1. Klint Finley (2010) reported 1.2 zettabytes of digital information would be created or replicated in 2010. That's 1,228.8 exabytes, or about 6.7 exabytes every two days. Or, in PR speak, a lot.
  2. Growth, Duponts et al (2011) revealed, has a lot to do with creativity. 

They noted that economic decline is associated with slowdown in innovation. They also showed that innovation itself could trigger a sustainable recovery in productivity.

This tells us that for Public Relations to thrive it needs to be creative. In today's environment this means PR has to understand Big Data.

The reason is simple. We need to understand what we can do with information and today, information is wrapped up in Big data. This means that our profession has to understand the technologies that allow practitioners to find and synthesis information and disseminate it to relevant constituents to inform their decision making and actions.

In addition, with this thing called Big Data, PR has to be innovative in its development of PR practice in addition to creative execution of PR programmes. If the profession is lazy in its approach to these technologies it cannot innovate, grow or even survive.

Finally, PR needs to be innovative to access good ROI (profitability). This means fostering technology diffusion and innovation (based on Rogers (2003) ;  enhancing the quality of decision-making; and  increasing demand and reducing production costs for providing data to our constituencies.

In this post, I am going to explore how practitioners can get to grips with Big Data. I will use an existing technology and will only focus on the simple stuff of media coverage and social media commentary. In fact, the principle remains the same for all Big Data.

We can inform the basics of what has to be achieved with the aid of Jeff Jonas, IBM Distinguished Engineer
Chief Scientist, IBM Entity Analytics. He has a great way of explaining how come we create so much data. 

For example our systems create, at minimum, 144 copies of just about everything we create (Jonas 2011).  When we make so many copies it becomes easier for computers to narrow down what or who an article, or content refers to - exactly. And growing Big Data improves the accuracy at an astonishing rate. Copies of content point to the source.

Jonas suggests that this leads to more leaks (aka Wikileaks in public), leaks that are not public, loss of information ownership and the potential for onerous legislation that would, in the end, reduce transparency.

Perhaps it is a good idea to put this into a PR context.

To be able to make sense of our Big Data environment is good.

We need to start by aggregating the news.

This includes media that would have been common at the turn of the century, the booming social networks and microblogs as well as photo sharing sites plus all the content we receive each day like email newsletters, specification sheets, White papers,  discussion lists, LinkedIn group discussions and much more. We can add academic journals, PowerPoint presentations, films and music al of which is common fare. We need to be able to find all this stuff.

But so much information is time consuming. We need to get computers to help.

Today, the PR industry is able to gather all of this content. There are a number of services that can and do achieve this and very cheaply. 

It is a lot of content and comes in a range of formats. From PDF's to web sites with advertisements, links and forms in and around the useful content. There are solutions to make these websites readable by computers some really good ones are free and open source software like 'BeautifulSoup'  http://www.crummy.com/software/BeautifulSoup/. There are many ways computers can then read the text for sentence detection, tokenization, part-of-speech tagging, chunking and parsing, named-entity detection, and co-reference analysis and much more. Being able to find and identify links in, meta data and all those SEO attributes and more is all part of the technology deal.

Computers can be taught to sort articles into subjects. If a group of managers are responsible for different markets your computer can be taught to sort out the content into specific subject areas for managers  and, at the same time using Bayesian logic, they can teach the software to grade citations from being very relevant to completely useless and to be discarded.

Sometimes we don’t want to see the same old same old cropping up a dozen time a day on computers and Tablets. On other occasions, we need to know that our brochures have been copied by bloggers, verbatim.

Managing Big Data, means you need to be able to choose how you deal with duplicates.

For computers as well as people, understanding what a text means is sometimes something of a challenge. The new generation of PR software looks at citations using semantic analysis. This identifies the concepts that explain what each sentence, in every citation, means in relation to  the subject in question.

It is something of a party trick to ask a group of people to explain a text and see how differently they interpret the content. Some software uses curators to teach the software which citations are important to the CEO or the FD and then again for the Chief Marketing Officer, customers, competitors and suppliers. All monitoring, measurement and evaluation has to reflect the culture of constituents. Without such sophisication the information and its interpretation is profoundlyflawed. Fortunately, these days computer programmes can be taught to reflect the culture of the user.

Here is where we sort out the competent from the charlatan. All sorts of software claims to offer computerised ability to determine whether a citation is favourable or not. The simple test the practitioner can make is to ask 'from whose perspective'.

Such perspectives are also developed for attributes such as influential, helpful, favourable and unfavourable, influence and  many other values.

Of course, the recipient can also be the curator. A CEO can teach the software to reflect his or her specific and personal interest.

Soon the software learns to reflect the views of the reader and reports its findings.

The computer now has a record of Big Data and a description of it. It has created an ontology, a description of the content.

The key now is what information might a practitioner want to find, analyse and report on.

This is data mining, digging into the resource to find, for example,  the important articles, tweets, post, 'Likes' or emails.

From a mass of data, the CEO can see just the five articles that are critical before the day begins and the FD can see what exactly is going to affect share price before anyone else. The practitioner will offer the context with one hand and the necessary detail with the other.

Perhaps it will be helpful to look further at how practitioners can acquire and  synthesis information and disseminate it to relevant constituents to inform their decision making and actions.

This then is how we offer our constituencies information in digestible form to empower them when making decisions.

Practitioners and thier suppliers can come to grips with Big Data. We can even go beyond texts, images and into a much wider range of content.

What is important here is that PR can, indeed must, be part of the Big Data revolution.


Dupont, J et al (2011) OECD Productivity Growth in the 2000s: A Descriptive Analysis of the Impact of Sectoral Effects and Innovation OECD Journal: Economic Studies http://www.oecd-ilibrary.org/economics/oecd-productivity-growth-in-the-2000s-a-descriptive-analysis-of-the-impact-of-effects-and-innovation_eco_studies-2011-5kgf3281fmtc accessed May 2011

Jonas, J 2010 Big Data Flows  http://www.oecd.org/dataoecd/33/33/46944407.pdf accessed May 2012

Finley, K (2010) Was Eric Schmidt Wrong About the Historical Scale of the Internet? http://www.readwriteweb.com/cloud/2011/02/are-we-really-creating-as-much.php accessed May 2012

Rogers, E. M. (2003). Diffusion of innovations (5th ed.). New York, NY: Free Press.