Sunday, March 04, 2012

The empathetic computer comes to PR

This is an article that shows how human empathy can be used in public relations evaluation by a computer.

Yes, I have heard it all before. It is not true that computers are bad at being empathetic. It is not true that they are bad at identifying tone and it is not true that they have no sense of nuance.

What is true is that the gap between computers and their software engineers and the frenetic anti-science public relations profession means that far too little time and money has been devoted to computerising public relations activities.

Today, I am going to attempt to explain, without getting too
 technical,  how a computer can learn to reflect information from the perspective of a human being. This is part of PR being transparent and doing replicable research and offering such capabilities as part of daily relationship management.

If you would like to use this software, please give me a call. but in the meantime, lets explore the sentient computer.

An age of information

It is not hard to find information about any subject, industry, issue or society. The internet has loads of it. So much so that it is hard to read and digest everything we need to cover the full spectrum of  important commentary out there. Can we find the important research; consumer reactions,  competitor activity, supply chain impact and the changing regulatory landscape? In such a vast corpus it's hard. Furthermore, can we identify if the work of a public relations professional is creating the right ambiance and atmosphere and social or behavioural change needed by the client? Is, to be pedantic, PR having an effect?

You need some fancy software to read everything and then report back to you just the important stuff to read and some sense of what all the rest of the information affecting your organisation looks like.

It can be done and I will explain how. This will show that we able to use technologies for PR that are based on well established, global research and standards.

Find the corpus

First of all, we need to be able to find everything. We need to be able add an individual web page, all the content of a web site of a number of sites. Obviously we need the corpus that comes from our clipping agency. Perhaps we also want to add to the media we are going to examine from RSS feeds, search engine returns or emails to the company. The Twitter 'fire-hose' is not the only one and we need all of that content too. Everything.

It could be everything about the company and key competitors, the whole industry sector and across the world. It's big.

Next we need to be able to get at the meat in each web page, email and URL shortener. There are services available to help us do that (and help explain the issues involved - if you  need to delve into the detail).

Finding the meaning of words

The next thing that we need to be able to do is find the meaning in the text. One of the common ways of doing this is to use a capability called semantics. Latent Semantic Analysis/Indexing (LSA/LSI; Deerwester et al., 1990, Landauer & Dumais, 1997) has proven its mettle in numerous applications, and has more or less spawned an entire research field since its introduction around 1990.

Everyone has head about semantics, but do we know what is involved and what it means. Semantics is part of the study of signs (semiotics) and explores the relationship of signs to what they stand for. To put it into PR speak, it is how machines can find the meaning of text.

We are able to use this range of technologies to allow a machine to do what a human does to understand words in context. You know, for example, that if I mention space shuttle, ship and train  in  the same sentence, I am talking about transport. This is how semantics  works out the meaning behind a text.





The process is about finding the meaning behind syntax.

For example, here are three ways of saying the same thing:

I Love Lucy
  
 Lucy

I   Lucy

This is the difference between syntax and meaning and semantic software learns to understand the meaning (I love Lucy). 

PR using universal research to get clever

What we can see above is a basic form used in semantics in a wide range of applications.

Every sentence has:
  • the subject (I)
  • the predicate and (love)
  • the object (Lucy)
It is explained in some depth at The Internet Grammar of English at UCL. Beyond simple grammar, we now have computer programmes that use this simple element of basic grammar which is, not surprisingly part of the W3C (World Wide Web) group of technologies called N-Triples

If we has a corpus of words, can we describe them as forms of grammar (Part of Speech) and semantic concepts.

It is possible to weight semantic concepts.

If we look at a number of simple triples, how it can be used in public relations becomes clearer.



Line 1 ... I love Lucy
Line 2 ... I love Mary
Line 3 ... I am David

We may want to find out how significant each element is in this corpus and how significant different parts of the corpus are.

We ask a human curator to provide scores for each of the semantic concepts.

In this corpus 'I' is obviously significant. We could give it a three +'s for being significant:
I = +++
Love is also significant we can give it two +'s
Love = ++
Lucy, Mary, and am are all the same so we can give them a + each.

But now we know that I and David are the same thing David gets a score of 3 +'s.

Lucy = +
Mary = +
David  = +++
am = +

From this you can see that it is possible to 'score' semantic concepts to help us decide how significant a corpus (or even and editorial item)  is. We need to know this in a data rich age. Insignificant content is too time consuming to contemplate (but we also need a capability to find those serendipitous articles too). 

Viewing content from different perspectives

Now lets look at each element of the corpus and look at them from a number of perspectives.

From the perspective of Lucy:

Line 1 would score three +'s for I plus two +'s for love and one for Lucy. The total score for line 1 = + 6
Line 2  would score three +'s for I plus two +'s for love,  and none for Mary in fact Lucy is jealous and so makes it -1. The total for line 2 = 4
Line 3 would score Three for I one for am and three for David. The total for line 3 = 7.

'Lucy' would score this corpus 17 and so would Mary if she was to score the corpus from the Mary perspective.

'David' would score this corpus total with 19 from David's perspective.

'Love' would score this corpus total with 16 from Love's perspective.

From this you can see that it is possible to 'score' semantic concepts from different perspectives.

What this means is that we can identify the relative significance and attitudes manifest in the corpus.

If we add just a few more concepts, you can see how the software can begin to make interesting assumptions:

Line 1 .... I love Lucy
Line 2 ... I love Mary
Line 3 ... I am David
Line 4 ... David lives in Swindon
Line 5 .... Lucy lives in Swindon

How cool is this for Lucy and David!

It may seem hard to score every word and so it is. Much better to teach a few items (articles) and the get some software to give the top semantic concept a slightly higher score and lesser concepts a lesser score but let such concepts accumulate scores as they re-occur. In this way, we can use the time of curators much better.

After a short time, the computer programme is going to be quite accurate but not all the time. In addition there is that need to identify those serendipitous articles.

Teaching computers empathy

This means that we will need curators to teach the software from a number of perspectives.
Here are a few more obvious ones

  • To what extent (lets say, on a sliding scale of +5 to -5) are citations ( articles, posts, Tweets, comments etc), relevant to the organisation (CEO, CFO, industry sector, competitor - there are lots of such perspectives).
  • To what extent (on a sliding scale) and from the very focused perspective of the CEO (or perhaps CMO, CFO, HR, Vendors, Customers, competitors etc) is this citation positive, neutral or adverse. 
  • To what extent (on a sliding scale) is this citation relative to an audience bias (for example the bias of a Londoner, resident of the South West or Wales). 
This means that the curator will need to make such judgements in a very empathetic way for each article. Then the software will add weighting to all the semantic concepts evident in the article, and the next article and so forth.

After a while, using the weighting of the semantic concepts, the programme is able to rank the most relevant articles for importance, tone and (in this instance) location. It is possible that a curator may be assessing an article for more than three perspectives. In some cases a lot more.

However, progressively, the software will be ranking citations (articles)  more accurately until it is able to predict the perspectives of citations with an accuracy of better than 80% compared to a human curator. 
It is at this point that serendipity comes into play.
Imagine that a citation is dismissed by the programme as not being very significant but that there are some semantic concepts that score exceptionally highly but do not lift the citation high enough to be of consequence?

This is a citation that needs a second look and the software can say to the curator, 'Can you have a look at this one - I am not sure about it'. The computer is now asking a curator to refine the judgements it has made and is getting help from the human curator.

Computers can learn and can ask for help when they lack knowledge.

How is this going to help PR?

We now have a computer programme that can look at this massive amount of information that is relevant to your organisation. Not just press clips about the company but internet citations as well and about a range of corporate, brand, organisational consumer, regulatory, political and cultural perspectives. The objective is to extract important and helpful knowledge not to mention to identify threats.

The medical sector has been using these approaches for the last decade. Examples include:
The value of the Semantic Web  for aiding neuroscience researchers. http://bit.ly/yAcEMT; Through the use of CSIRO’s automatic semantic text analysis services, seamless and value added information extraction can be achieved while substantially reducing the effort for manual abstractions. http://bit.ly/AvgIID; automatic feedback generation for virtual patients, using semantic web technologies http://bit.ly/AhNFYZ.

Such programmes used in public relations mean that evaluation of activity from the perspectives of range of stakeholders is not only possible but reasonably easy to undertake.

An example will help understanding.

The range of events influencing and changing the practice of public relations today is as broad as lobbying, media relations, financial and corporate, social media and evolving practice in internal PR and much more.

With the best will in the world, being able to read all the academic papers, the government consultative documents and submissions to an array of commissions and enquiries is beyond the capability of most practitioners. Add to that all the new and best practice in social media, the arrival of new communications channels like Pinterest all requiring development of knowledge, skill and best practice. Print and specialist media is changing fast and the practice not to mention the debate over churnalism are problems of the moment.

The new semantic capability can search for all citations (including academic) by subject, it will offer up the most significant and relevant content form perspectives of the institutions and practitioners, academics, regulators, recruiters, journalists and many more. It can offer this for those with an interest in news, education, in-house, consultancy, criticism and much more. 

Of course, being able to provide similar intelligence about adversing and marketing, social media and SEO sectors would be a significant advantage.

Users would be able to view the industry in much the same way that a Sky subscriber would for television and on smart phones, tablets, PC's and even the big screen in reception.

There are some fun by-products too. This type of analysis makes it easy to identify the most powerful media,    authors and critics.

As  for the public relations industry so to for baking, banking and biscuit making.

Todate, the PR industry has sipped no sup and craved no crumb of these new capabilities and they are very new but they are going to mean big changes.


Useful authors 

Deerwester, S.; S. Dumais; G. Furnas; T. Landauer; R. Harshman; Indexing 
by Latent Semantic Analysis. Journal of the society for information science. 
41(6), 1990

Landauer, T. & S. Dumais; (1997) A solution to Plato’s problem: the Latent Semantic Analysis theory for acquisition, induction and representation of knowledge. Psychological review. 104(2),