Showing posts with label Technoloy update. Show all posts
Showing posts with label Technoloy update. Show all posts

Friday, June 03, 2011

The search for hidden meanings

Throughout written history, people have engaged in finding the hidden meaning in writing.

Fascination at the hieroglyphs on the walls ancient Egyptian temples and burial sites extends back well before  4 PM on November 26, 1922 when Howard Carter’s search for hidden meanings resulted in the discovery of the 3300 year old and untouched tomb of 19 years old king Tutankhamun .

Today, we are even more fascinated with exploring our written (and spoken) language.

And it all comes down to what is known as Part-of-speech tagging (POS tagging or POST).

Most of us have done it at school by identifying words as nouns, verbs, adjectives, adverbs, etc.

Back when the Beatles were at their peak, America and its allies were embroiled in the Vietnam war, Dr Christiaan Barnard carried out the world's first human heart transplant and The Six Day War was fought in the Middle East, NASA launched an unmanned Apollo 4 test spacecraft and Britons got their first colour television programmes . But in that same year the one development that affects more people today and will do in the future is the work of Henry Kucera and W. Nelson Francis.  They published their classic work Computational Analysis of Present-Day American English (1967), which provided basic analysis about words in texts on what is known today simply as the Brown Corpus.

Henry Kucera and W. Nelson Francis did more complicated analysis than getting computers to find nouns and verbs but the principle is the same. It is a process largely based on relationships with adjacent and related words in a phrase, sentence, or paragraph. 

Once performed by hand, POS tagging is now done in the context of the son of the Brown Corpus, computational linguistics. It uses algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags or forms of description or, more recently, that are created as they are found ‘on the fly’.

The reason that Kucera and  Francis work is so important is that we have built a whole new form of society on this idea.

Clever scientists have used this idea of extracting hidden meaning to develop a new form of internet.

One of these ideas came from three academics Scott Deerwester, Susan  Dumais, George Furnas, Thomas Landauer and Richard Harshman (1990). They outlined how to analyse relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. Called Latent semantic Analysis (LSA), the idea assumes that words that are close in meaning will occur close together in text.  

This idea is used by all manner of analysis programmes and helps find those hidden meanings.

In their paper they says “...Thus while LSA’s potential knowledge is surely imperfect, we believe it can offer a close enough approximation to people’s knowledge to underwrite theories and tests of theories of cognition.”  Since 1990, academics have come a long way and accuracy is getting ever closer to social reality. 

Today, the use of semantics makes the Google and Bing web search algorithms more accurate, helps newspaper journalists find the most authoritative sources for information and informs the top companies about events and their drivers to optimise financial, marketing and communication decisions.

Remember Kristen Urbahn’s story I blogged about three weeks ago? It has lots of hidden meanings. Using Extractive.com’s special search engine Kristen can find out about the relationships between different parts of the story (using automated Part of Speech tagging).

The results show the nature of some of the significant words:

PERSON (46)
│├SCREEN ACTOR (4)
││└ Kathy Griffin (4)
  she
││  her 
│├US CABINET MEMBER (1)
││└ Donald Rumsfeld
│├US PRESIDENT (3)
││└ Obama (3)
││  Obama
││  Obama
││  his 
│├ Brian Williams
│├ Dan Pfeiffer
│├ Jill Jackson (2)
  Jill Jackson
│├ Keith (6)
  Keith Urbahn
  He
 │├ Kristen Urbahn (13)
  Kristen Urbahn
   her
  Kristen Urbahn
 
  Kristen 
│├ Maggie Fox
│├ Osama Bin Laden (6)
  Osama Bin Laden
  Bin Laden 
│├ Osama Bin Ladin (5)
  Osama Bin Ladin
  He
  Osama
  he 
│└ Sohaib Athar



LOCATION (14)
│├GPE (13)
││├COUNTRY (5)
│││├ Afghanistan
│││├ Pakistan (2)
│││  Pakistan 

│││└ US (2)
││├CITY (4)
│││├ Abbottabad
│││├ Denver
│││├ Guardian
│││└ San Francisco
││└US STATE (4)
││  Kansas
││  South Carolina
││  Washington (2)
││   Washington
││   Washington 
│└ Wiltshire
ORGANIZATION (21)
│├COMMERCIAL ORG (16)
││├MEDIA ORG (7)
│││├BROADCAST NETWORK (5)
││││└TV NETWORK (5)
││││  BBC
││││  CBS
││││  CNN (2)
││││  NBC
│││├ New York Times
│││└ Washington Times
││├ Defence
││├ Google
││├ Social Media Group

││└ Twitter (5)
││  Twitter 

│├NON GOVERNMENT ORG (2)
││├ Al Qaeda
││└ Republican Leaders Office
│└UNIVERSITY (3)
  Preston University
  University of Kentucky
  Yale
CONTACT INFO (1)
│└URL (1)
 HTTP (1)
  http://goo.gl/qHnFH

OTHER (18)
│├FACILITY (4)
││└BUILDING (4)
││  White House (4)
││   White House Communication Director
││   White House 

│├LINKED OTHER (11)
││├ Capitol Hill
││├ Christian
││├ Creative Commons
││├ Dachshunds
││├ Internet
││├ Internet
││├ Mobile
││├ POTUS
││├ President Obama
││├ Royal Wedding
││└ The New York Times
│└SOFTWARE (3)
  Facebook (3)
   Facebook 

DATE-TIME (16)
│├DATE GENERAL (8)
││├DATE (2)
│││├ Aug. 18, 2009
│││└ May 1 2011
││├DAY OF MONTH (1)
│││└ 1 May
││├MONTH NAME (1)
│││└ May
││├RELATIVE DATE (2)
│││├ months ago
│││└ the evening
││└YEAR (2)
││  2006
││  2011

│└TIME (8)
  10:30 p.m. Eastern Time
  10:40 p.m.
  10:53
  11 p.m.
  11:35
  4pm EST
  9:45 p.m.
  from 10:45 p.m.-2:20 a.m.
NUMERIC (20)
 MEASUREMENT (4)
 │└DURATION (4)
   Five years
 
  days
 
  former
 
  the hours
 NUMBER (11)
 │├ 2.0
 
│├ 3,000
 
│├ 5,000
 
│├ 7.24
 
│├ millions
 
│├ more than 185
 
│├ one
 
│├ one
 
│├ six
 
│├ three
 
│└ two
 ORDINAL (5)
   Third
  
 first
  
 first
  
 second
  
 third


Here, then, are the key elements that can be extracted from the blog post.

Two people from the 18th and 19th centuries now star in this story.

Thomas Bayes (1702–1761) was the son of London Presbyterian minister with a clever mathematical brain. He came up with what can be described as a way to look at these hidden parts of text and other content and find out the extent to which a particular inference is not true. For example Twitter is a big part of the Kristen Urbahn story but it is by no means the focus of the events in Pakistan.  It was just an (important) means by which information was shared across the globe. Thomas’ clever mathematics is the means by which it is possible for computers to make decisions about the probability that information can be relied on and, in that case, the role of Twitter in news distribution.

With enough information and generous computing power, of which modern man has plenty, Bayesian probability offers something like a partial belief, rather than a frequency. This allows the application of probability to all sorts of propositions rather than just ones that come with a known structure. "Bayesian" has been used in this sense since about 1950. Advancements in computing technology have allowed scientists from many disciplines to pair traditional Bayesian statistics with other techniques to greatly increase the use of Bayes theorem in science. Now, computers can both learn from experience and are beginning to be good at prediction.

Twitter was important for the Urbahn story and so, the software might tell us, Twitter will be significant for other stories in the future.

It is such techniques that modern managers need to hand if only to be able to discover emerging trends in communication and or news and events.

Fifty years after Thomas death, George Boole  (1815 – 1864) came into this world to give us all a great way of discovering information.  George (who was married to an equally mathematically brilliant wife Mary and who was the nice of the man who gave Mount Everest its name), gave us Boolean algebra (1854). Today most people know it because it is useful when searching for information using search engines. The Boolean operations AND, OR, and NOT help narrow down searches to get more closely to the facts we seek (Kristen AND Urbahn OR Forcht).

But, the use of AND, OR, and NOT in mathematics and computing has other applications and when combined with Bayesian probability (and other similar math) which means that computers can be used to make accurate, predictive and related inferences and learn, for themselves, from the results.

In practice, we find useful tools to give us insights into events.

For example http://twitris.knoesis.org/  (created at Kno.e.sis at the College of Engineering and Computer Science at Wright State University) provides us with an ontology, related Tweets, links to highly relevant web pages, a chart of Tweet rates and much more.


In practice, a manager can keep a close eye on mentions of a company, brand or product and the reputation drivers behind the Twitter stream.

No one is pretending that business managers need to understand all the technologies. There is a need, however, to know that using such advances is now becoming central to modern management and communication.


Bibliography
Kucera, H. and Francis, W.N. (1967) Computational Analysis of Present-day American English Journal: Neuroimage - NEUROIMAGE
Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman (1990). "Indexing by Latent Semantic Analysis". Journal of the American Society for Information Science 41 (6): 391–407
 Boole, George (2003) [1854]. An Investigation of the Laws of Thought. Prometheus Books. ISBN 978-1-59102-089-9.
Gruber, Thomas R. (June 1993). "A translation approach to portable ontology Specifications". Knowledge Acquisition5 (2): 199–220.


Further Reading:
Introduction to LSA http://lsa.colorado.edu/papers/dp1.LSAintro.pdf
Semantic Inference in the Human-Machine Communication http://www.springerlink.com/content/ju71rcn9pq0wcmy3/
Continuous Semantics to Analyze Real-Time Data http://wiki.knoesis.org/index.php/Continuous_Semantics_to_Analyze_Real_Time_Data
Web semantics and ontology By Johanna Wenny Rahayu http://books.google.com/books?id=K7yFJVu8NDYC


Twitter, Facebook, and dozens more sources come through Gnip's API, normalized and enriched with metadata. http://gnip.com/


Tuesday, April 26, 2011

Public Relations - can it be a science in its own right?

Among its many duties, the Chartered Institute of Public Relations, the association of individual practitioners in the UK, are responsibilities to be involved in research.

Indeed, under its Charter it is mandated to do so.

The objects for which the Institute is incorporated shall be:
to promote the study, research and development of the practice of public
relations and publish or otherwise make available the useful results of
such study and research;

The Institute has established a Research and Development Unit to create a hub for industry and academic research.

The CIPR Research and Development Unit Working Party includes among its members Dr Sandra Oliver (Emeritus Professor), Dr Jon White (Visiting Professor), Dr Reginald Watts (Business Consultant) and Jay O'Connor (CIPR Immediate Past President).

The CIPR website page of research resources provides an insight into the extent to which, so far, its research outcomes are promoted by the CIPR and the extent to which the Institute promotes or makes available studies and commissions research on behalf of its members.

Then there is a mass of  information about Measurement and evaluation including the Measurement & Evaluation Fellowship Award in the UK (more information here); the Measurement and Evaluations toolkit and the social media version; the Valid Metrics guidelines and a page offering links to resources for measuring different sectors of PR accompanied by case studies relevant to the sector.

The Local Public Services Group is to provide members with inspiration, know-how and reassurance to actively participate in public relations activities, by exposing them to the experiences and good practice of key practitioners in the field. This group signposts third party case studies and research and papers.

There does not seem to be a reference to the Alan Rawel Academic Conference at any time in the future.

It will be interesting to see what transpires from the deliberations of Dr Sandra Oliver, Dr Jon White, Dr Reginald Watts and Jay O'Connor but I have some concerns.

Jay O'Connor suggests that the committee will 'bring together what is a significant body of knowledge about PR practice' and Reginald Watts, Chair of the Unit, says: "Together with those practitioners, consultancies and research organisations that are active in PR research, there are many practitioners and researchers with PhDs in subjects directly related to communications. My hope is that we can mobilise such members, along with others, to shape future practice and to help us to understand the changing communications environment. This is an exciting and timely undertaking by the CIPR. We are committed to bridging the gap between professional and academic research in a way that will be both creative and highly relevant to practice."

There is some need for the PR profession to acquire the confidence in its own right to work on blue sky research.

Many pure play public relations areas of interest have huge economic social and political significance and deserve the kind of attention to research that medicine has in the minds of research funders.

Some example include:

The wider nature of communication like ubiquitous internet as well as new forms of human/machine communicative interaction (like, for example, body/avata languages using the Kinect type of technologies) become the norm in human and human/machine relationships.

The extent to which we understand the drivers of relationships and the extent to which relationships affect matters such as reputation and recognition of entities (e.g. brands, companies, other institutions and machines) are poorly understood. To-date, our understanding is based on research that accepts that relationships exist now how and why they form (social sciences), are evolutionary (Psychology/evolutionary sciences) or are robotic and are not truly helpful in the reality of organisational relationship management.

In my line of interest, the significance of semantics, personal data (and the relationship between control of institutions in some form of digital democracy to control the emerging internet executive/s) are becoming significant for the profession. People offer a cloud of data about themselves and yet there is no means by which a form of vox populi democracy can challenge the owners of such data (governments, utilities and service vendors).

Value based relationship issues, where everything from corporate objectives to website meta tags affect the capability of organisations to operate without creating inherent dissonance with organisational constituents is poorly understood.

The nature of diversity and ethics in relationships are also major areas of emerging concern where we depend on education and social grooming to release value from human interest and development and yet are amazed at the capability of the dispossessed to invent and provide.

Then, again, there are the issues associated with the nature of trust in relationships. If the worlds banking system, and the government of huge swaths of the global population break down for want of of trust, surely the PR industry should be at the heart of research into trust.

Of course there is a case for having practice based research and there is a case for using and even adopting the better cases of research from other sciences but there is also a case for a public relations science in its own right.

I just hope that the Institute should consider such an ambition and be bold in its considerations.

Wednesday, March 23, 2011

Online Public Relations research tools

It has occurred to me that I have never shown the PR industry, and notably academic researchers, the technologies I, and my students and commercial partners have used to come to the conclusions we do.

I make them available to you here and now.

Some are quite old and have been superseded by better technologies and I am very happy to help researchers who want to use these tools in research activities that will give the PR industry better insights into the nature of online communication.


Semantic Web Experiments

We have been working on Latent Semantic Indexing for nearly a decade but now we are looking at a range of 
other ways the semantic web can offer practitioners insights.

This is an experiment that dynamically identifies an ontology. The objective here will be to allow the practitioner to drill down further and further to find out who is affected and involved with an entity in a web page (e.g. news story).
You can try it out for yourself her http://entitymap.appspot.com/


Reputation Wall

This is a development we have taken a very long way. It searches for pages about a search topic, opens up the web pages, normalises the texts, parses the texts of all the pages for semantic concepts (latent semantic indexing - we have our own software to do this) and then looks for the most powerful concepts month by month going back a year.

You can create your own 'Reputation Wall' here http://reputationwall.appspot.com






Track This Now

A media story or picture comes to prominance and you want to now where in the world it is popular right now. Well, here is the service that gives you an instant world and regional snap shot.


You can find your news of the moment here



Finding Semantic Concepts

This tool was used to discover relationships between people and organisations in a big research project. You can enter a lot of website URL's into it and it will return the 50 most significant semantic concepts in the corpora. 


 I find it is more manageable if you remove the URL's and then paste the words into a programme like TagCrowd to generate a semantic word map.

Value Systems Analysis

This software levers the semantic analysis of pages and looks at bigger corpora. In this case current Google News, Blogs and natural search. The analysis shows values in bold in the texts. 

The software was developed as a series of software developments for academic research. In this case the  software was part of the development for building the values theory in PR. The outcome was presented at theBled symposium in 2009:


Web Page Text Analysis

One of the hard things to do is to re-construct web pages to extract the text and then find the sematic concepts
and much more.
This tool is really clever because it shows the steps involved. You can extract the text on web pages with this tool too.


Video News
Finding the latest video is harder than you think. There are so many channels.

We thought that it would be a good idea to have them all in one place and this was the first part of developing a special type of search which you can see in NewsRokit.

You can play with the software here http://crowdmint.appspot.com/


Google Hourly Search to CSV

Everyone want to get a spreadsheet of the latest pages indexed by Google. This toy allows you to just the last
hour's worth of pages indexed by Google.

To try it yourself here is the URL http://search2csv.appspot.com/



Summariser

Did you want to make a quick summary of a web page?

This may help.


Throughout, these experimental tools do not use word counts. The approach is always to use latent semantic indexing as the basis for experimentation.

Have fun with the technologies.

Tuesday, November 09, 2010

Now we have another (practical) form of communication

Microsoft launched Kinect this week. It represents a new, practical and already available form of communication. It is part of what I have described as the Experiential Web.

Forget cave paintings, scrolls, books, letters, newspapers, telegraph, radio, phones television, the web and social media. They all require a mechanical interface. Now communication is possible just through movement.

Adopted and used by the PR industry, it can help the industry begin to achieve its full potential in this technology led era and should add a new and additional revenues stream.

I imagine that in two years there will be in excess of £250 million worth of PR fees connected with this single product alone and that it will grow the industry and significantly. I am not so sure that this will be the PR industry as we know it but it will definitely be PR and the first Kinect trained PR students will emerge into the market within a year.

Kinect's capability and application has a role in communicating with almost every organisational constituency.

No overalls, gloves, wands just a person going about daily activities can be involved in this form of communication.

How this is achieved has been talked about a lot already and it requires only a tiny imagination to see applications in almost every form of PR activity.

Mostly, Kinect is aimed at games and rather standard communication like conference calling but that need not deter the PR practitioner.

Just by having a constituent move in front of a sensor, the practitioner can now deliver a message, interact in real or virtual way and can engage directly. With its face recognition Kinect can deliver messages to people  based on no more than a photographic image (offering a whole new dimension for direct interaction with attendees at the corporate AGM, launch, political rally, press conference not to mention the retail outlet, sporting event etc).

To do all this the practitioner may want to partner with games developers like Scott Henson, at Rare who says:

You saw our big bold vision for Kinect when we rolled it out last year and now we’re going to enable that. It’s amazing to me how much we’re going to deliver to consumers at launch but we’re just scratching the surface of what we can do....

Kinect is not a game it is a new form of communication.

Friday, August 27, 2010

Semantic progress

Yesterday Philip Sheldrake gave a talk to the Chartered Institute of Public Relation Social Media gathering (anyone can come - it costs £10 and is at 5pm every Thursday) on the semantic web. It was excellent and you can access it here.

Among the things he showed us was the work of Philipp Heim (University of Stuttgart), Steffen Lohmann (Carlos III University of Madrid), Timo Stegemann (University of Duisburg-Essen).
They have taken existing structured data to allow you to go and find relationships between two entities (I chose to find the relationship between Nick Clegg and David Cameron on this page).

The work they have done is important and uses existing structured data sets.

At Klea Global my colleague Girish has been working towards a way of creating structured data sets using Natural Language Programming including LSI to build (RDF) structures on the fly from content derived from newspapers, Blogs, Facebook, Twitter etc.

He already has gone quite a long way and you can see an example of how it is possible to create this process with some very new and pretty smart tools. One of which is available for you to try here.

OK, so what is this for, and why is it relevant to Public Relations.

I guess the secret is in the second part of the name of our profession: relations.

Using these capabilities, we can find out all manner of relationships between two entities (subject - object). When, using the Semantic Web, these relationships make sense, all this data will be ever more powerful.

To get some idea of how much data, here is another 'toy' you can play with from Klea Global labs (and yes, I have started to put it all in one place at last): Track This Now. Using this free 'search and scope presence' software, you will see that an amazing amount of information is accumulating about your company, client, university etc.

Knowing how much there is, and knowing that most of what is said online about organisation does not come from the company or traditional media is only half the battle. There is so much accumulating out there that we are overwhelmed.

If only we could find out what the relationships were between all those tweets and press articles, we would have some chance of influencing them, building up huge SEO for clients and lots of other marvellous things. Worry not... salvation is at hand.

These are very early days for these developments to bear fruit for the PR industry but next year they will be quite astonishing. We already know how to do it and in less than a year will be doing it.

This is so exciting for our industry and my only regret is that we don't have a single university in the UK with a capability to do this sort of research.

Saturday, May 29, 2010

A small contribution for the Stockholm Accords

I am delighted at what I have seen of the Stockholm Accords

The dynamism of Toni Muzi Falconi is breathtaking and I am full of admiration for the efforts of RonĂ©l Rensburg and Anne Gregory in their explication of the change that is taking place in the world today.

But I am not without concerns.

Perhaps, as we look to the next two or three years of PR practice it gives us a clue as to the life of the Acccord. It is a bold effort but, in my experience, will have a struggle to survive or have any impact.

My interest is in how the internet affects the world and PR in particular. I did predict its significance to the CIPR in 1995 and was involved in some of the papers for the now long forgotten 1999 CIPR/PRCA Internet Commission (some of the papers are here and some are here Journal of Communication Management; Volume: 5; Issue: 2; 2000 ).

I am a practitioner, researcher and teacher and so am part of this industry. Part of me is agast at how little we regard the future. Students leave university with scant understanding of internet implications for their future work. At best they are told about something called 'Social Media' (a module that could equally be called etiquette). I see some agencies 'sliming down' because of the 'recession'.  They don't recognise that they are being by-passed. There is some form of belief in this industry of ours that the internet is, progresively, having a greater effect on our lives and has effects that mediate everyone's life. The big thinking concerns online reputation developments, convergence in marketing communications and best practice social media measurement. This is a linear view, a straight line graph of change.

The reality is much more potent.The influences brought about by the internet are not straight line, they are exponential. According to an IBM study, by 2010, the amount of digital information in the world will now be doubling every 11 hours. Some years ago Kevin Kelly explained the effects of exponential growth of hyperlinks in network rather well when he told of the prior and future 5000 days.

Some clue to this change can be seen in the consumer/tech cell phone in our pocket or handbag. The move from phone/text to email to hand held mobile computer has been quite quick and as quickly has become passe. Another clue may be found in changed consumer habits and annual growth of online retail sales of 25% plus every year. The biggest development is from, effectively, no cloud computing four years ago to common place corporate application with, in the UK, companies like Rentokil Initial replacing all their email into the cloud in two years, Insurance giant Aviva, Logistics firm Pall-Ex and Universal Music already implementing mass internal and extrernal communication in the cloud and tiny tiny organisations like mine with mega computing power for pennies.

Should it care to use it, the Centre for PR Studies at Leeds Met now has unlimited computing power available without making the lights dim. In the last month, the capability for my research into semantic public relations has moved from being stalled by the high levels of media coverage for the general election to being able to provide both semantic analysis of text and an automated taxonomy to find infered links. This is not a mega university reserach institute it is, literally, in a shed at the end of my garden.

In three years we will have both inference of relationships and predictability of discourse at very high levels of accuracy routinely using massive cloud computing power.

These capabilities will change how governments and societies operate because they will provide near complete radical transparency of every organisation. You and I will be able to find out the precise nature of the common values that hold disperate organisations, their financial backers, customers and other stakeholder in thier networks.

As for companies, so too for terrorists, wayward governments and so forth.

As the leading thinkers in the world explain in this video, we very nearly have the knowlege and we do have the computing power.

It may possibly be that it is the PR industry that benefits from these developments but linear thinking however ambitious the growth projection may be, is not enough.

From the values lecture, I gave in Lincoln four years ago to Bruno Amaral's Euprera discourse this year to cloud capacity for semantic PR development in the last month is pretty impressive.

But this thinking has drawbacks. It is not a conversation one can have with practitioners. They both could not understand nor have the inclination to want to stare so much change in the face. Equally, I know of only one Masters course world wide which is prepared to entertain such radical thought (I don't know of a PhD doing such work - but would be thrilled to find one).

It is for these reasons that I think the Accord, like the CIPR Internet Commission will need re-thinking from scratch in three years.

But it is a great start that can be developed in June.