Showing posts with label Content Management Monitoring. Show all posts
Showing posts with label Content Management Monitoring. Show all posts

Friday, January 06, 2012

All the PR data you can eat - without getting indigestion

In the next few months, I and some friends will be creating a new resource for PR practitioners, evaluation companies and academics (and a lot of other people too).

The idea is really quite simple.

If you could search online for all media (that includes all kinds of media from Twitter to the Telegraph) stories and comments about your organisation everyday (or hour, week, month, year etc) and put them in a database, spreadsheet, list or a web page that would be pretty average, but useful. If you could do the same about competitors, market sectors, business partners or individuals such as journalists, Wikipedians, FB friends that might be useful too. We are going to make this happen.

Each article (citation), will have a lot of information attached to it. We have some indication via Alexa, of  the age, sex and location of readers. We will also add all we know from Google Trends and Google Analytics that there is a lot more information that attaches to stories, blog posts and online comments. We will collect it  (and more) all and attach it to your story's URL.

Of course, it will be useful to be able to extract the text (without the HTML mark-up and advertisements)  and make it searchable, summarise-able and with lists of tags, mark-ups, semantic concepts (ranked in order of significance), Parts of Speech and hyper-links in and out. So we will add them also so you can use the information if you need to.

Now comes your part.

We will make these data available to you. The full set. Via an API, spreadsheet and a range of other formats so that you can download just what (that is, only the data and no more or less) you want. Yes, that's right, we will collect it, but you don't have to use it. So, if all you want is a news feed every day that's OK. If you want international trends by the hour, that's cool too. You can choose to smell the ocean or sip from the fire-hose.

Now you will be able to make your own news apps, evaluation apps and subject specific web news outlets.

You will be able to match media data with business data (enquiries, sales requests for information etc etc - anything you want). Making a list of Twitter users mentioning your brand will be as easy and that list of blogs mentioning your competitors or Slideshare mentions about your CSR programme. All the information will be ready for you to mine.

We will offer an API that any friendly programmer can use to make you anything you want.

We will offer some tools too. A de-duplication tool will be useful so that you can set up your own de-duplication parameters (all those re-tweets can be counted without having to edit every one) and a smart curation capability would, we guess, be helpful too.

Perhaps you have some ideas as to what you would like to include in our data set (mentions in Facebook ? - we're onto it).

If you are in-house, an agency or an academic PR, or evaluation company or research organisation and you want to be a beta tester, or early adopter, we will be delighted to talk you through what we have in mind to do.

In due course we shall be holding open events and application developer seminars but all that is for the future.

Right now we aim to offer a simple, though significant service.






Thursday, September 22, 2011

Six years later, we are talking about value

I was watching Derek Halpern talk about blog posts on YouTube and he suggested re-visiting old posts. I did.
The oldest talked about valuing relationships.

Here it is:



We have moved on. In six years, we have some better views:


Now, of course we are getting closer to having a form of ROI which I outlined in June.

In this instance, I did propose a form of valuation that would be effective for all types of PR and which would provide a value for public relations. We do have tools that can help with this approach and that would be useful.

So now who is going to work on creating some real (Open Source) facilities for the PR industry to share to be able to measure its effects?



Thursday, September 01, 2011

Measuring and evaluating

As we get closer to the new academic term, I thought it may be helpful for students to take a look at how they can examine the work they have been involved in during this gap year.

We have moved past the time when a PR practitioner could imagine that he or she has delivered anything of worth if it is not available online. Getting some sense as its effect, even effectiveness may mean using any number of services.

Of course there are a host of tools out there which can be used but it may be very useful to have a quick look at the range of different tools and approaches that can be used.

Now, this is not a game about 'evaluating PR' - whatever that may mean. This is not about outputs and outcomes. It is all about how internet technologies, aided by people, have represented the activities of an organisation in a range of ways. Its more complicated than traditional PR evaluation which has been stuck in the mud of counting column inches for far too long.

Perhaps the first task is to look at some of the tools available. The broaden the mind.

A close examination may offer an insight into the ones that will shed light and the ones that will shed confusion.

So many claims and so little transparency is not helpful.

The next thing to do is to determined what  each service offers. What, information, for example is provided and what is its value to a communications expert.

Perhaps then, it would be time to see if we can offer insights to the practitioner in order to aid decision making about activities with measurable outcomes.

The list I offer is gleaned from bookmarks created over the years (so some links may not work). They are about tools that can offer a wide range of data .

Here, then, is the first column of your spread sheet!



Oh, yes and here are some old pages I produced four years ago:



Tuesday, August 16, 2011

A letter to my MP over the Court of Appeal’s NLA's decision


Dear Mr Buckland.

The British economy does not need more restrictions on its most successful sectors. The internet is delivering £100 billion a year to the UK economy and needs reasonable attention to protect the opportunity it brings our citizens.

Specifically, and in this case, I am writing following the Court of Appeal’s decision on the 27 July regarding ‘temporary copying’. The decision means that many UK citizens will unwittingly infringe copyright as they use the Internet.

This situation has arisen as a result of a judgement in Newspaper Licensing Authority Ltd. (NLA) v Meltwater Group and the Public Relations Consultants Association (PRCA). Further details can be found here (http://bit.ly/oqhEoX) but the principle on temporary copies extends far beyond this case.

The ruling is such that that the process of your constituent displaying a web page on screen would be considered in law as the same as making a copy, and that anyone browsing a web page is subject to such terms and conditions. Their display of such web pages in their home or place of work is potentially, terns sight unseen, contrary to the law. 

The legal position of your constituents is thereby compromised (and most frequently, in all innocence) and the consequence is not helpful in the interests of the UK's world leading, and economically significant)  position viz a viz the internet.

Owners of web sites have many ways in which they can protect content from even the most ardent hacker as many companies in your constituency can attest. 

In the lead up to this Decision, a number of newspaper proprietors have put themselves beyond normally acknowledged protection offered to your website-publishing constituents and local enterprises. Thus the proprietors seek special pleading and potentially at the expense of Swindon people and  businesses.

For some, there is a need to protect intellectual property and to gain reward for diligent, legal and honest effort invested in content. However putting the onus on users of the Internet to avoid infringing rights sight unseen is counter-intuitive and a threat to the free use and access of the internet.

One anticipates the Hargreaves Review (http://bit.ly/e7jPxQ) will consider this special pleading by media owners and no doubt you will have a constituency interest in his findings and how he will inform the Secretary of State for Business and Intellectual Property.

Professor Bently, Emeritus of Intellectual Property, Cambridge University is of a similar mind and expresses his view here: (http://bit.ly/r9F12U)  

One understands the dichotomy of Members and legislators attempting to keep up with technological advance. In this case, being sympathetic and attempting to give long established, decaying and desperate vested interests due hearing is necessary but need not undermine the legitimate work and play of your constituents.

In this case, browsing content online must fall within a temporary copy exemption and should not require a right-holder’s prior, sight unseen, consent for reasonable use. 

Your etc

Friday, June 03, 2011

The search for hidden meanings

Throughout written history, people have engaged in finding the hidden meaning in writing.

Fascination at the hieroglyphs on the walls ancient Egyptian temples and burial sites extends back well before  4 PM on November 26, 1922 when Howard Carter’s search for hidden meanings resulted in the discovery of the 3300 year old and untouched tomb of 19 years old king Tutankhamun .

Today, we are even more fascinated with exploring our written (and spoken) language.

And it all comes down to what is known as Part-of-speech tagging (POS tagging or POST).

Most of us have done it at school by identifying words as nouns, verbs, adjectives, adverbs, etc.

Back when the Beatles were at their peak, America and its allies were embroiled in the Vietnam war, Dr Christiaan Barnard carried out the world's first human heart transplant and The Six Day War was fought in the Middle East, NASA launched an unmanned Apollo 4 test spacecraft and Britons got their first colour television programmes . But in that same year the one development that affects more people today and will do in the future is the work of Henry Kucera and W. Nelson Francis.  They published their classic work Computational Analysis of Present-Day American English (1967), which provided basic analysis about words in texts on what is known today simply as the Brown Corpus.

Henry Kucera and W. Nelson Francis did more complicated analysis than getting computers to find nouns and verbs but the principle is the same. It is a process largely based on relationships with adjacent and related words in a phrase, sentence, or paragraph. 

Once performed by hand, POS tagging is now done in the context of the son of the Brown Corpus, computational linguistics. It uses algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags or forms of description or, more recently, that are created as they are found ‘on the fly’.

The reason that Kucera and  Francis work is so important is that we have built a whole new form of society on this idea.

Clever scientists have used this idea of extracting hidden meaning to develop a new form of internet.

One of these ideas came from three academics Scott Deerwester, Susan  Dumais, George Furnas, Thomas Landauer and Richard Harshman (1990). They outlined how to analyse relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. Called Latent semantic Analysis (LSA), the idea assumes that words that are close in meaning will occur close together in text.  

This idea is used by all manner of analysis programmes and helps find those hidden meanings.

In their paper they says “...Thus while LSA’s potential knowledge is surely imperfect, we believe it can offer a close enough approximation to people’s knowledge to underwrite theories and tests of theories of cognition.”  Since 1990, academics have come a long way and accuracy is getting ever closer to social reality. 

Today, the use of semantics makes the Google and Bing web search algorithms more accurate, helps newspaper journalists find the most authoritative sources for information and informs the top companies about events and their drivers to optimise financial, marketing and communication decisions.

Remember Kristen Urbahn’s story I blogged about three weeks ago? It has lots of hidden meanings. Using Extractive.com’s special search engine Kristen can find out about the relationships between different parts of the story (using automated Part of Speech tagging).

The results show the nature of some of the significant words:

PERSON (46)
│├SCREEN ACTOR (4)
││└ Kathy Griffin (4)
  she
││  her 
│├US CABINET MEMBER (1)
││└ Donald Rumsfeld
│├US PRESIDENT (3)
││└ Obama (3)
││  Obama
││  Obama
││  his 
│├ Brian Williams
│├ Dan Pfeiffer
│├ Jill Jackson (2)
  Jill Jackson
│├ Keith (6)
  Keith Urbahn
  He
 │├ Kristen Urbahn (13)
  Kristen Urbahn
   her
  Kristen Urbahn
 
  Kristen 
│├ Maggie Fox
│├ Osama Bin Laden (6)
  Osama Bin Laden
  Bin Laden 
│├ Osama Bin Ladin (5)
  Osama Bin Ladin
  He
  Osama
  he 
│└ Sohaib Athar



LOCATION (14)
│├GPE (13)
││├COUNTRY (5)
│││├ Afghanistan
│││├ Pakistan (2)
│││  Pakistan 

│││└ US (2)
││├CITY (4)
│││├ Abbottabad
│││├ Denver
│││├ Guardian
│││└ San Francisco
││└US STATE (4)
││  Kansas
││  South Carolina
││  Washington (2)
││   Washington
││   Washington 
│└ Wiltshire
ORGANIZATION (21)
│├COMMERCIAL ORG (16)
││├MEDIA ORG (7)
│││├BROADCAST NETWORK (5)
││││└TV NETWORK (5)
││││  BBC
││││  CBS
││││  CNN (2)
││││  NBC
│││├ New York Times
│││└ Washington Times
││├ Defence
││├ Google
││├ Social Media Group

││└ Twitter (5)
││  Twitter 

│├NON GOVERNMENT ORG (2)
││├ Al Qaeda
││└ Republican Leaders Office
│└UNIVERSITY (3)
  Preston University
  University of Kentucky
  Yale
CONTACT INFO (1)
│└URL (1)
 HTTP (1)
  http://goo.gl/qHnFH

OTHER (18)
│├FACILITY (4)
││└BUILDING (4)
││  White House (4)
││   White House Communication Director
││   White House 

│├LINKED OTHER (11)
││├ Capitol Hill
││├ Christian
││├ Creative Commons
││├ Dachshunds
││├ Internet
││├ Internet
││├ Mobile
││├ POTUS
││├ President Obama
││├ Royal Wedding
││└ The New York Times
│└SOFTWARE (3)
  Facebook (3)
   Facebook 

DATE-TIME (16)
│├DATE GENERAL (8)
││├DATE (2)
│││├ Aug. 18, 2009
│││└ May 1 2011
││├DAY OF MONTH (1)
│││└ 1 May
││├MONTH NAME (1)
│││└ May
││├RELATIVE DATE (2)
│││├ months ago
│││└ the evening
││└YEAR (2)
││  2006
││  2011

│└TIME (8)
  10:30 p.m. Eastern Time
  10:40 p.m.
  10:53
  11 p.m.
  11:35
  4pm EST
  9:45 p.m.
  from 10:45 p.m.-2:20 a.m.
NUMERIC (20)
 MEASUREMENT (4)
 │└DURATION (4)
   Five years
 
  days
 
  former
 
  the hours
 NUMBER (11)
 │├ 2.0
 
│├ 3,000
 
│├ 5,000
 
│├ 7.24
 
│├ millions
 
│├ more than 185
 
│├ one
 
│├ one
 
│├ six
 
│├ three
 
│└ two
 ORDINAL (5)
   Third
  
 first
  
 first
  
 second
  
 third


Here, then, are the key elements that can be extracted from the blog post.

Two people from the 18th and 19th centuries now star in this story.

Thomas Bayes (1702–1761) was the son of London Presbyterian minister with a clever mathematical brain. He came up with what can be described as a way to look at these hidden parts of text and other content and find out the extent to which a particular inference is not true. For example Twitter is a big part of the Kristen Urbahn story but it is by no means the focus of the events in Pakistan.  It was just an (important) means by which information was shared across the globe. Thomas’ clever mathematics is the means by which it is possible for computers to make decisions about the probability that information can be relied on and, in that case, the role of Twitter in news distribution.

With enough information and generous computing power, of which modern man has plenty, Bayesian probability offers something like a partial belief, rather than a frequency. This allows the application of probability to all sorts of propositions rather than just ones that come with a known structure. "Bayesian" has been used in this sense since about 1950. Advancements in computing technology have allowed scientists from many disciplines to pair traditional Bayesian statistics with other techniques to greatly increase the use of Bayes theorem in science. Now, computers can both learn from experience and are beginning to be good at prediction.

Twitter was important for the Urbahn story and so, the software might tell us, Twitter will be significant for other stories in the future.

It is such techniques that modern managers need to hand if only to be able to discover emerging trends in communication and or news and events.

Fifty years after Thomas death, George Boole  (1815 – 1864) came into this world to give us all a great way of discovering information.  George (who was married to an equally mathematically brilliant wife Mary and who was the nice of the man who gave Mount Everest its name), gave us Boolean algebra (1854). Today most people know it because it is useful when searching for information using search engines. The Boolean operations AND, OR, and NOT help narrow down searches to get more closely to the facts we seek (Kristen AND Urbahn OR Forcht).

But, the use of AND, OR, and NOT in mathematics and computing has other applications and when combined with Bayesian probability (and other similar math) which means that computers can be used to make accurate, predictive and related inferences and learn, for themselves, from the results.

In practice, we find useful tools to give us insights into events.

For example http://twitris.knoesis.org/  (created at Kno.e.sis at the College of Engineering and Computer Science at Wright State University) provides us with an ontology, related Tweets, links to highly relevant web pages, a chart of Tweet rates and much more.


In practice, a manager can keep a close eye on mentions of a company, brand or product and the reputation drivers behind the Twitter stream.

No one is pretending that business managers need to understand all the technologies. There is a need, however, to know that using such advances is now becoming central to modern management and communication.


Bibliography
Kucera, H. and Francis, W.N. (1967) Computational Analysis of Present-day American English Journal: Neuroimage - NEUROIMAGE
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman (1990). "Indexing by Latent Semantic Analysis". Journal of the American Society for Information Science 41 (6): 391–407
 Boole, George (2003) [1854]. An Investigation of the Laws of Thought. Prometheus Books. ISBN 978-1-59102-089-9.
Gruber, Thomas R. (June 1993). "A translation approach to portable ontology Specifications". Knowledge Acquisition5 (2): 199–220.


Further Reading:
Introduction to LSA http://lsa.colorado.edu/papers/dp1.LSAintro.pdf
Semantic Inference in the Human-Machine Communication http://www.springerlink.com/content/ju71rcn9pq0wcmy3/
Continuous Semantics to Analyze Real-Time Data http://wiki.knoesis.org/index.php/Continuous_Semantics_to_Analyze_Real_Time_Data
Web semantics and ontology By Johanna Wenny Rahayu http://books.google.com/books?id=K7yFJVu8NDYC


Twitter, Facebook, and dozens more sources come through Gnip's API, normalized and enriched with metadata. http://gnip.com/


Tuesday, April 12, 2011

In defence of Aves

Having been involved with PR evaluation for over 20 years, I have been watching a number of recent debates about the use of statistical analytics with a lot of interest. 

Last week PR Week , the Public Relations industry trade publication, put a number of perspectives together about the use of Advertising Value Equivelents (AVE’s), http://goo.gl/oBEk. There is heated debate on the subject. 

Tom Eldridge put an argument together entitled “ Why Klout and Peerindex fail to measure your online reputation” last January http://goo.gl/oKrOo.

Newly financed http://www.ubervu.com has, like many others, automated sentiment analysis as part of its service.

The evidence of these debates goes on and on.

What they all have in common is that they use algorithms in an attempt to bring insights into an ocean of data.
In PR, Marketing and advertising the use of algorithms is commonplace and always has been.
In psephology, the study of election results, as well as in sample surveys and  focus groups, the face value figures are not commonly helpful and need interpretation. In their development, a system of managing these extrapolations quickly turns into an algorithm used for used for calculationdata processing, and automated reasoning.

There are some key elements to be considered when using algorithms for gaining insights.
The first is the quality and range of data used.

In almost all research there are a lot of variable to be considered.

For example, in many evaluation methodologies used in PR and advertising media selection, a test of readership for a specific article is expressed in a range of ways including newspaper readership, circulation, position of page and position on page and a whole range of other data points.

The extent to which any of these measures can be attributed to the actual readership of any specific news story is often not clear.

A measure of value of an advertisement can be attributed to the cost the market will bear and thus an advertisement of a specific size, page and position will provide evidence of the value of that real estate in a publication. Such space, were it to be editorial and as appealing to the reader could be considered to have a comparable value.  An Advertising Value Equivalent is on its way. Because editorial has the imprimatur of being editorial it is regarded with more authority by the reader and therefore, some say, has an even greater value. For some it is twice as much and for others five times as much and more.

Here we see evidence of the second key element in using algorithms.

The data used and the methodology adopted need to be common, commonly understood, and transparent for anyone to judge the veracity of the results provided.

In an article, ‘The problem with automated sentiment analysis’, Freshnetworks show how deeply one needs to look into such algorithms http://goo.gl/tjCyI and demonstrate clearly that the devil is in the detail. It notes that humans can be about 80% accurate in sentiment analysis of media corpora and that machines can compete but not in the fine detail. Thus the computers provide an excellent overview already.
That there are criticisms and that there are issues is beyond doubt but progressively, the ability of computers to take the strain and reduce no small proportion of cost.

I suggest, before dismissing automation as useless, there is a case for looking for current benefits in the knowledge that very soon developers will have the computing ability to resolve the issues.

AVE’s may be dismissed in 2011 but will they, or an alternative come back to bite the critics in a year of two?

I believe they will.  

Wednesday, March 23, 2011

Online Public Relations research tools

It has occurred to me that I have never shown the PR industry, and notably academic researchers, the technologies I, and my students and commercial partners have used to come to the conclusions we do.

I make them available to you here and now.

Some are quite old and have been superseded by better technologies and I am very happy to help researchers who want to use these tools in research activities that will give the PR industry better insights into the nature of online communication.


Semantic Web Experiments

We have been working on Latent Semantic Indexing for nearly a decade but now we are looking at a range of 
other ways the semantic web can offer practitioners insights.

This is an experiment that dynamically identifies an ontology. The objective here will be to allow the practitioner to drill down further and further to find out who is affected and involved with an entity in a web page (e.g. news story).
You can try it out for yourself her http://entitymap.appspot.com/


Reputation Wall

This is a development we have taken a very long way. It searches for pages about a search topic, opens up the web pages, normalises the texts, parses the texts of all the pages for semantic concepts (latent semantic indexing - we have our own software to do this) and then looks for the most powerful concepts month by month going back a year.

You can create your own 'Reputation Wall' here http://reputationwall.appspot.com






Track This Now

A media story or picture comes to prominance and you want to now where in the world it is popular right now. Well, here is the service that gives you an instant world and regional snap shot.


You can find your news of the moment here



Finding Semantic Concepts

This tool was used to discover relationships between people and organisations in a big research project. You can enter a lot of website URL's into it and it will return the 50 most significant semantic concepts in the corpora. 


 I find it is more manageable if you remove the URL's and then paste the words into a programme like TagCrowd to generate a semantic word map.

Value Systems Analysis

This software levers the semantic analysis of pages and looks at bigger corpora. In this case current Google News, Blogs and natural search. The analysis shows values in bold in the texts. 

The software was developed as a series of software developments for academic research. In this case the  software was part of the development for building the values theory in PR. The outcome was presented at theBled symposium in 2009:


Web Page Text Analysis

One of the hard things to do is to re-construct web pages to extract the text and then find the sematic concepts
and much more.
This tool is really clever because it shows the steps involved. You can extract the text on web pages with this tool too.


Video News
Finding the latest video is harder than you think. There are so many channels.

We thought that it would be a good idea to have them all in one place and this was the first part of developing a special type of search which you can see in NewsRokit.

You can play with the software here http://crowdmint.appspot.com/


Google Hourly Search to CSV

Everyone want to get a spreadsheet of the latest pages indexed by Google. This toy allows you to just the last
hour's worth of pages indexed by Google.

To try it yourself here is the URL http://search2csv.appspot.com/



Summariser

Did you want to make a quick summary of a web page?

This may help.


Throughout, these experimental tools do not use word counts. The approach is always to use latent semantic indexing as the basis for experimentation.

Have fun with the technologies.