Friday, April 07, 2006

Faults and bugs in the new BBC Archive

Gents

I invite comments regarding your own experiences in using the SEARCH facilities on the new BBC WW2 archives.

Speaking for myself, I find the present situation is lamentable.

I give a simple example.

Imagine if you will, that you are on the 'new' site and wish to find articles written by Frank.
You key into the SEARCH THE ARCHIVES box the keywords 'Frank Mee', enclosed you will note, by apostrophes. You get the following results.

Your page of search results for "'Frank Mee'+ww2"
BBC - Tees Features - Your stories of war
FRANK MEE Frank Mee was just 10 years old when war was declared.
www.bbc.co.uk/tees/features/war/
BBC - WW2 People's War - The Man Who Never Was, chapter 1 - A2160415
My only real buddy so far had been a chap called Harry Mee, who stood next to me, in the rear rank, when we were on parade in Bulford basic training camp.
BBC - WW2 People's War - Bournemouth, Bombs, and My Two Brothers - A3526535
I think we had just been given Arthur Mee's Children's Encyclopaedia, all 10 volumes, and I have it still. www.bbc.co.uk/dna/ww2/A3526535
(Note that only the first item is applicable)

By contrast, if you were to do the same search procedure on the 'old' site you would get the following 20 items on one page, with a further 5 or six on the following page:

ARTICLE ID TITLE STATUS
A2097867 It's Over: VE Day in Stockton-On-Tees Edited
A2553761 The Dance Hall, Wartime Escape Edited
A8906376 Farewell to the BBC WW2 People's War Website -
A2110465 Flaming Barrage Balloons, Teeside Edited
A1300465 Flaming Balloons -
A1316288 A Night at the pictures -
A1930079 Active Army Cadets -
A1361981 The Beginning. -
A1361963 The beginning. -
A1365419 Early Days -
A2267615 HMAS Sydney. -
A2521270 Introduction -
A2132029 Christmas, Worlds Apart Edited
A1934958 More Active Army Cadet's -
A1901819 Reluctant Private Evacuee. -
A2041589 More Wartime Schooling -
A4126961 Starting work in war time part five -
A2010213 Wartime Schooling in Stockton-on-Tees -
A2638550 A peaceful Sunday Morning. -
A1324090 The sky was black with planes -
--------------------------------------------------------------------------------
No Previous Results | Next results >> (which would give you another 5 or six articles)

I rest my case

20 Comments:

Blogger Frank Mee said...

Ron well done, I had to go to Books to find my own stuff. When I put my name in the box all I got was a few bits and pieces on Google and they were saying they could not find WW2 articles.
I followed the path you gave in the e-mail the other day and after reading yours went walk about from the links.
It appears there is not a direct path to anywhere. Peter will be laughing now as he will have found the go directly to jail route last week some time.

Friday, 07 April, 2006  
Blogger Peter G said...

Initially, you will recall, I emailed the Team to let me have details of any bugs so that they could all be returned to Debré in one list. Ron is looking at the Archive in depth and has already come up with an impressive list of weaknesses and faults. In the light of that, it seems to me that we should all now report any further quirks directly to Ron.

The BBC has done a great job and I do like the layout. But once the Archive is handed over for permanent storage (as rumour has it) it will be too late to get things sorted out. Now that Ron has pointed out these serious shortcomings, surely a decent programmer could be spared for a couple of hours to sort things out? It amazes me that the BBC, with all its vast resources, did not test the site thoroughly before releasing it. As it stands few links work. They seem to have lost sight of the fact that the World Wide Web is called a web precisely because it depends entirely on LINKS to form a web. Links are the be-all and end-all of the web.

The search facility was abysmal on the old site, the expectation was that that would be put right in the new Archive, but as Ron has very clearly demonstrated, it hasn't. [pace, Frank. Google is a superb search tool and will ferret out all your input to the archive]. Since the BBC 'in house' search tool is pretty useless, why not team up with Google?

Friday, 07 April, 2006  
Blogger Tomcann said...

I too tried for some time to find my own input into the BBc and it was Ron who put me right as I had given up - there was absolutely no record of anything concerning Troopertom

Friday, 07 April, 2006  
Blogger Ron Goldstein said...

Me again

The cardinal sin that the BBC have committed is the one relating to article ID numbers.

The SEARCH THE ARCHIVE box clearly invites you to search using the ID number.

My Personal Page number is U520216.

Keying this in, from anywhere on the site, either with or without apostrophes, produces nothing at all. Try this with your own number or indeed with any article number.

Debre.....are you reading me ?

Saturday, 08 April, 2006  
Blogger Steve Wright said...

The Search 'facility' is something I meant to comment on as soon as I'd experienced it. Granted, the old system used to return all manner of stories, containing your search parameters in the title or body, but at least they were on the website. Now, we have the 'Results from all of the BBC'. Who's programmed this 'facility'? It would appear that it is someone who has no knowledge of specific website-based search engines.

Saturday, 08 April, 2006  
Blogger Ron Goldstein said...

Hi Steve

It really is a shocking situation.

Did you realise that when you are forced to go back to the old site in order to find any data there is not a single link that will take you back to the new ?.

Please feel free to add further comment and then I will e-mail Debre again inviting her to join in the thread

Saturday, 08 April, 2006  
Blogger Peter G said...

Because of this unsatisfactory state of affairs I've reinstalled the link to the old People's War site. I've programmed it so that the new and the old open in two separate windows so that you can have them conveniently side-by-side. You will find the link to the old site directly under the Archive link.

At the same time, I've corrected the link to Ron's superb Army Album.

Saturday, 08 April, 2006  
Blogger Ron Goldstein said...

Peter, you clever young man !

It never occured to me to put both sites on the screen together !

Anyway, the additional links will make life easier for all of us so thanks for that

Saturday, 08 April, 2006  
Blogger ritsonvaljos said...

Hello folks,

My principal comment about the new BBC "People's War" Archive is that it is a "Curate's Egg", if you pardon the expression.

These are some of what I feel are the "good in parts" aspects: the categorisation, the layout, the photo gallery for different categories.

These are some of the "not so good in parts": some stories are not in what I would deem the 'correct' category and could be missed, and - as everyone else has obviously found - the search engine does not seem as good as the old site.

Any chance the BBC might take on board the views of the 'Site Helpers' who did such a sterling job for them for a couple of years? Once the site is completely archived then it is unlikely to get sorted in subsequent years because the expertise will no longer be to hand.

It's a pity the site has closed down for new entries as I am still getting a lot of additional information passed to me about the war. I'll write some of these up and putthem on this site shortly. They may of general interest even if I can't get them on to the "People's War" website.

Sunday, 09 April, 2006  
Blogger Peter G said...

See also a Comment made by Ritsonvaljos in this thread The BBC WW2 Archive posted Sunday 9 April.

Please reply to that comment here and keep all comments regarding Archive faults in this thread. For clarity, I've taken the liberty of changing Ron's initial Post title.

Thank you.

Monday, 10 April, 2006  
Blogger ritsonvaljos said...

Thanks Peter, Ron et al!

I hope somebody takes notice of the various comments you have all made. You would have thought the WW2 Site Team would have asked those of you who were Site Helpers to have another look through the Archived Site. A project on this scale needs a lot of good help. Eliminating problems before the site went 'live' would have been better.

On the whole, the Archived Site has a lot going for it. Hopefully the various suggestions made will be help sort things out. It would be a pity not to get the Site 'spot on' after all this time.

Here's hoping .....!

Monday, 10 April, 2006  
Blogger ritsonvaljos said...

Hello again folks,

I've been checking through some of the other accounts I contributed to the BBC "People's War" site to see lif any other accounts have been placed in an 'incorrect' category. An account I wrote about the 1st Battalion The Border Regiment in the Battle of Arnhem (Airborne Division) has been allocated to the 'France' category rather than 'Netherlands' or 'Arnhem 1944'.

The original Article ID is A5887461 and its title is "We took you forever into our hearts" in case anybody wishes to look at it and see what I mean. All the events referred to take place in the Netherlands in this article. I doubt anybody would ever look for an article about Arnhem under 'France' so this account would be in a limbo.

I have sent the WW2 team a message via the 'Contact Us' link on the Archived Site. I'm not sure how many people are still working on the "People's War" Site, but I hope the categories can get corrected. Perhaps if all the various faults and bugs are passed to the Team in a block it may be possible for them to arrange another check on the site?

There are quite a lot of good things about the Archived Site. I hope nobody feels I am being too pedantic about this?

Anyaway, the best of luck to you all.

Tuesday, 11 April, 2006  
Blogger ritsonvaljos said...

Re: 'Technical overview of the Archive build' and 'Bayesian analysis'.

Has anybody read, or perhaps even had explained to them, the automated system for classifying contributed articles to the "People's War" website? If not, this link should lead you to that page:

http://www.bbc.co.uk/ww2peopleswar/about/project_12.shtml#bayes

To be perfectly truthful, I have had some difficulty understanding what has been written on this page of the "People's War" Archived site. I have studied both Mathematics and Modern Languages at University, including Bayesian analysis, so I think most people are going to require to think hard about the meaning of what is written.

So far as I read it, according to the technical page many of the contributed stories have been automatically categorised by a computer scan. This categorisation is apparently based on predetermined key words or terms, and as the technical page goes on to state, the categorisation "... is accurate to within 85 % to 90%".

Remembering my mathematical training, this means that 10% to 15% of accounts will not have the correct categorisation. Presumably the reason an article I contributed to the "People's War" site about the Battle of Arnhem (Netherlands) appears under the 'France' category is because my User profile has picked up that much of my WW2 research has been about wartime France? Does anybody know if this is the likely reason? Why other accounts I have contributed for a relative growing up in Preston, Lancashire should be placed under London I am not too sure.

Hence, while it is recognised these accounts may be in the wrong category, it does not seem to be easy having them recategorised. Having sent a message to the WW2 team via the 'Contact Us' route, this statement was included in the reply:

"I shall forward your query on to the technical team. If they are able to move these entries, they will - but we do not have any maintenance time scheduled for some
time."


Perhaps this additional information will help Ron to compile a provide a list of weaknesses and faults to pass on to the WW2 Team. The message I received also says that they expect most people will look for accounts by 'search' rather than category, and so people will find the accounts and read them. Perhaps this may be so ....? Personally, I still would not think of looking for an article about the Battle of Arnhem under a category headed 'France'.

Possibly those of you who have had more direct contact with the WW2 Team and the "People's War" Project may understand the process somewhat better? I'm sure it has all been done on good faith and for the best of reasons.

Thanks

Saturday, 15 April, 2006  
Blogger Ron Goldstein said...

Hi all

Some further thoughts on the subject.

We all know how frustrating the poor search facility has been, starting with 'old' WW2 site and continuing, it would seem, on the 'new' archives.

As someone who had submitted almost 100 articles I felt duty bound to provide at least a decent index to my postings and with the help and encouragement of Peter I created a Chronological Index that provided links to all my articles.
(If you go to h2g2 and look at my personal page you will see this in its original format).

Now consider what the BBC has managed to achieve for me (and any poor future reader of the archives).

If you START at the archives and my personal page none of the links work on the Chronological Index.

If you start with h2g2 (again at my Personal Page) the links take you BACK to the 'old' site where you are then promptly advised to go BACK to the 'new site' to read the story !

Re-reading this thread from the beginning one can pick up the angst from all concerned and one can only hope that the BBC is concerned for it's image and will get it's proverbial finger out to solve the various problems we have raised.

Having said that, it would be churlish of us not to show our appreciation for the truly excellent new Archives that have arisen from the ashes of the old site and for this, many thanks.

Looking forward to a response from one of our old friends.

Sunday, 16 April, 2006  
Blogger ritsonvaljos said...

Hi Ron and others,

Yes, I've looked at the 'old' and 'new' sites for your stories and see what you mean. I fully agree with what you say that, on the whole the Archived "People's War" is an excellent site for a record about WW2.

Hopefully, as you say the BBC WW2 Team will realise everybody is trying to be helpful in flagging up the 'faults and bugs'. Perhaps if things are not ironed out now, when there is still a lot of knowledge, experience and enthusiasm around from contributors and Site Helpers then they may never be ironed out.

For example, as I mentioned in an earlier posting to this thread one article I contributed to the "People's War" was about the 1st Battalion the Border Regiment at Arnhem (Netherlands). As this has been categorised under 'France' people finding the site might be left with the impression that the Battle of Arnhem was in France. Even now, and among people who are a similar age to me (in their 40s), there are such a lot who have the idea that 'Dunkirk' and 'D-Day' are somehow the same thing. I have a feeiling that people in future could think that because the BBC have put an article about Arnhem in 'France' then in France it must be!

Overall I am sure the Archived "People's War" Site will be a fine resource for many years to come. The BBC and the WW2 Team should be saluted for their efforts in doing such a fine job. Of course, the Site Helpers did a brilliant job in advising others in their particular specialist areas. Saying 'Well done' and 'Thank You' does not cost anything and should be used in this instance. It will be even better resource if the minor faults mentioned by the various contributors to this thread can be sorted out at an early stage.

Monday, 17 April, 2006  
Anonymous Debré said...

Hello all,

Thanks for all your comments on the archive site. I'm pleased to see the many nice things you have all said! Here follows a response to some of the bugs you have identified:

1. Re the Search facility. You should find that this now works as it should. Searching for 'Frank Mee' produces 10 pages of results from the archives. Searching for a U or A number will also bring up the correct page. The reason this didn't initially work was because the live site needed to be indexed by Search, something that can only be done once the site is live.

2. Re categorisation. Some stories will be categorised in the wrong places, but we are confident that the vast majority of the 47,000 stories are in relevant categories. Just over 17,000 stories were manually categorised by two very hard-working humans, the rest were categorised with a naive Bayesian classifier. Read more about this process here: http://www.bbc.co.uk/ww2peopleswar/about/project_12.shtml. This method of classifying stories was by far the best of the options that were available to us.

3. Re the existence of the 'old' DNA site, and the links from there. We decided to keep this site, with links from each DNA page to the relevant archive page, because many people will have bookmarked their own contributions. This gives them the opportunity to simply click on a link on the DNA site to go to their story on the archive site - rather than arriving at a broken link, which is what would have happened if we had replaced the old site with the new.

4. Re Peter's comment that 'few links work' (7 April). Peter, if you are still experiencing this problem, could you let me have details of the links that aren't working? We are not aware of any.

I hope this has addressed all your comments, do let me know if there are further issues.

Best wishes,
Debré

Tuesday, 18 April, 2006  
Anonymous Debré said...

Here's a proper link to the Bayesian classification info page.

Tuesday, 18 April, 2006  
Blogger Ron Goldstein said...

Dear Debre

Many thanks for bearding the lions in their den and seeking to address our complaints :)

I do think the BBC have gone a fair way towards a simpler 'search' system but in all honesty I am still mystified with regards to the 'links'.

If my Personal Page on h2g2 can give me live links to my articles why can't I get the same result on my Archive Personal Page?

At this point I cheerfully hand over the thread to Saint Peter who will explain my problems much better than I can, simply because he was responsible for me creating the links in the first place !

I'm sure a lot more will be said on this thread and your imput is much appreciated

Best wishes

Tuesday, 18 April, 2006  
Blogger ritsonvaljos said...

Hello Debré and all,

Thanks for your time in looking at the various points raised and your message. As previously stated, on the Archived Site is really good.

I had previously read the page on the link you give about Bayesian analysis on the Archived Site. I must admit I had to read it through 3 times to fully understand what it was saying. Because most of my contributed accounts were on behalf of relatives / friends / veterans groups etc who have been really enthusiastic about what has happened to their own accounts, or the accounts of other folk that they know. Not many of them are into the intricacies and the terminology of of a computerised analysis. It is not easy trying to explain why one or two are in what is evidently the 'wrong' category. Fortunately, as you say, and as is written on the Archived Site link, most of the stories have ended up in an appropriate category.

I have tried the Search facility for a few differernt things. That also seems to work fairly well most of the time. Sometimes it doesn't work. Perhaps we are all perfectionists on this Blog?

Thanks for taking it all on board and your efforts over the Project life to all the WW2 Team.

Tuesday, 18 April, 2006  
Blogger Peter G said...

Hi Debré

4. Re Peter's comment that 'few links work' (7 April). Peter, if you are still experiencing this problem, could you let me have details of the links that aren't working? We are not aware of any.

To start go here: U520216. You will find yourself at Ron's Personal Space in h2g2, a mirror image of Ron's old Personal Page in the original People's War.

See all the links? All working. Now scroll down to SECTION 4 ARTICLES POSTED, CHRONOLOGICALLY.

There you will see that all his stories' ID numbers act as links. Here is the first paragraph with links duplicated for your convenience:

Sep 3, 1939
(03) Five Sons, all serving in H.M.Forces (A2025028)
(04) Waiting to be called up (A2416268)
.

All the links still work, albeit indirectly.

Now go here to Ron's new u520216 Personal Page. None of the old links work, a few of the hundreds of links that no longer work. The text is exactly the same, but if you now scroll down to that same section SECTION 4 ARTICLES POSTED, CHRONOLOGICALLY you get this:

Sep 3, 1939
(03) Five Sons, all serving in H.M.Forces (A2025028)
(04) Waiting to be called up (A2416268).


Identical text but now the article ID numbers serve no purpose whatsoever. Presumably you copy them and paste them into the Search facility.

This is what I meant, Dedré, when I said that few links work. Perhaps I should have said few of our old links work, since logically any new links that have been put in, work.

------------

P.S. Welcome to our blog! :)

Tuesday, 18 April, 2006  

Post a Comment

Links to this post:

Create a Link

<< Home