I could spend a lot of time writing a definition of semantics or the semantic web. I could show how the inventor of the web Tim Berners-Lee finds it all absorbing, why Google thinks its is essential to its future survival, and how some serious thinkers see how it is important for the future of society.
It's a much more fun to put on a practical demonstration. That is what I am going to do.
The demonstration will seek to show that it is possible to identify as a moment in time the key semantic notions that define a genre and individuals in the genre.
The methodology I shall apply is listed in this post but I shall also provide the practitioner with the tools that allow practitioners and researchers to replicate the findings.
To ensure that this is a relevant case study, I shall take an example of major competitive public relations campaigns, the UK General Election. Specifically I shall look at the semantic similarities and differences of the three leaders: David Cameron, Conservative; Gordon Brown, Labour and Nick Clegg, LibDem.
This is a big project and we are limited (by the technological challenge I face) to sampling the corpus. In the future we do not have to be limited by such constraints.
The methodology I am able to use is as follows.
- Every 40 minutes I shall use and automated bot to interrogate the internet to identify new web pages published in a day which mention each of the three major party leaders. I anticipate this will be of the order of 200,000/300,000 every day (or more). Of these I will select 1000 pages (citations) on the basis of number of views and mentions of the leaders in headlines and first paragraph. This content will include publicly available items of: news media pages in online newspapers, magazines and other news outlets (offering news that is not hidden behind robot blocks and paywalls); blog posts, Twitter tweets, Social Network contributions, wiki pages, Bulletin Boards, discussion lists, List Serve, Sidewikis, comments about photographs and videos, slideshows and other web based pages.
- Each of these selected citations will be parsed (software available here) to extract the the contiguous text which will be retained for further analysis together with an audit trail giving date found and URL.
- Each citation will then be parsed using latent semantic indexing software which will identify the semantic concepts in each citation (here is software that you can use to extract concepts from web pages).
- I will then rank the concepts in order of frequency of use in the citations for each day. This will provide a rather boring list of words and their daily count.
- To make it easy to see the result and to compare the three Party Leaders, I will use a wordwall for visualisation purposes so that you can compare the most significant semantic concepts for each of the three selected leaders.
- These will be posted on this blog every day until polling day.
What do we anticipate this is going to show?
- This is a proof of concept demonstration showing the semantic differences between the three competitors.
- This will show how using a sample of online content selected for its reach and readership the web reports the three campaigns.
- The analysis will show how these citations represent an online view of the competitors' similarities and differences.
- It shows how all manner of online influences can represent the three candidates.
As the evidence appears day by day, it will be interested if there is any advice that a PR professional would propose to a candidate based on what the online community is 'thinking'.
Of course some of the PR response will be based on the relationships at play; values that attach to the candidates and the extent to which these responses are driven by people who are motivated to do thinks (like post comments or vote?) and other factors.
Then, we have Semantic Public Relations.
I suspect that what I will be showing in this demonstration is that the online community is driving the agenda and what I think we will find is that the competitors are ignoring a large part of that agenda.
I suspect that the PR response that I hope you will provide will be in near real time and will interpret the results as part of a process in working out what future, internet mediated, ubiquitous interactive communication will look like for effective PR practice.
Enjoy.
It should be remembered that the methodology has not been fully tested (mostly so that it can be available quickly for the CIPR SM committee to see how the internet is moving on and in support of Philip Sheldrake's work). If this was to be a research project to provide a research base for PR practice, it would be conducted differently But this is a nice demo (and, of course, I am very happy to help anyone who wants to do this work for the PR sector).
I think I'm getting your drift David, but the output will clarify it for me no doubt. I'm most interested in how your approach differs from the extant services from the social web analytics (aka social media analytics) vendors, some of which are highly capable, and some very much less so of course!
ReplyDeleteI think you might be alluding to the potential for so-called clustering a la Crimson Hexagon? An ability that appears to be inherent to SAS' new SM analytics service too, although I have yet to get my mits on that one.
But like I say, I look forward to your output to really get to grips with your perspective here.
And thanks for the namecheck too!
How do you measure the number of views a page is getting?
ReplyDeleteHi Derek, I am using Alexa data (http://www.alexa.com) in this instance.
ReplyDeleteSo not a measurement of the number of total views, but a measurement of the number of views from a small subset of internet users who have volunteered to have their online activities monitored and recorded.
ReplyDeleteHmmm... that is not quite what Alexa does and this is, as the methodology shows is analysis of a sample. Over the election I think there will be about 800,000 citations per day - its a guess but based what we have seen so far.
ReplyDeleteThat's almost exactly what Alexa does
ReplyDelete"Alexa's traffic rankings are based on the usage patterns of Alexa Toolbar users and data collected from other, diverse sources over a rolling 3 month period. A site's ranking is based on a combined measure of reach and pageviews. Reach is determined by the number of unique Alexa users who visit a site on a given day. Pageviews are the total number of Alexa user URL requests for a site. However, multiple requests for the same URL on the same day by the same user are counted as a single pageview. The site with the highest combination of users and pageviews is ranked #1."
http://www.alexa.com/help/viewtopic.php?f=6&t=17&sid=dc01936ebb7aff817f3be0964fe62040
It's rankings have to be taken with a very large pinch of salt.