Wednesday, October 18, 2006

FRBRising with the folks

During last semesters advanced cataloguing class I spent a great deal of time thinking about the relationship between traditional cataloguing and modern collective systems like "tagging". Eventually I began to think about FRBR and the changes coming out of OCLC. Particularly, I focused on the various attempts being made to identify the "work" under the new system. OCLC has begun to test catalogue FRBRisation with what they call the "work set" algorithm and have met with some success, but certainly far from 100%. The problem that I just don't see them getting around is that often the "work" is simply not represented in the traditional bibliographic record, not even as a combination of elements. If this is the case no amount of processing by computer or librarian will be able to accurately and consistently identify and group "works". What the FRBRisation process needs is just a little added information about each record. This seems like a perfect task for a social bookmarking application. I'm not suggesting that social bookmarking or tagging should replace the more traditional details of the cataloging process. Any information that already exists in the bibliographic record our can be found on the chief source of information should still be dealt with in the tradition fashion. However, the "work" to which an item belongs is neither currently found in any pre FRBR databases nor easily derived from the chief sources of information.

I do recognize that the devil is in the details with these things. The problem of multiple overlapping "works" would undoubtedly arise in the absence of a predefined set, among other problems. However, Google and others have had a great deal of success devising algorithms that use user input, but don't do so literally. User input is filtered and analyzed to identify trends, commonalities, etc. This is how the Google spell checker works, google suggests a word based on the aggregate misspellings of the searching community. The idea behind a social FRBRisation project would not be to let the community definitively define the work of an item like they might tag a picture on flickr, it would be to generate the information lacking in a traditional bibliographic record so, that something like the "work set" algorithm might perform more efficiently.

The other problem I foresee arising is that a library undertaking a FRBRisastion project might not have the time or resources to develop and shape the kind of community that would be require to pull something like this off. However, I believe this has a solution as well. I was recently listening to a podcast/interview with Jeff Bezos, the founder of Amazon.com. He was addressing the various new non-consumer products that Amazon has begun to offer. All the various services were interesting to here about but one caught my attention immediately, Mechanical Turk. I had remembered reading about it when Amazon first introduced it but hadn't paid any real attention to its development. The Mechanical Turk allows for organization to programmably farm out small tasks to large groups of independent contractors. Each task itself is worth only a few cents, but individuals who signup can, in theory, perform many tasks in a very short span of time, enough to make a reasonable sum of money. Bezos called it artificial artificial intelligence, because from the programmers point of view the Amazon computer is doing all the work. In reality the Amazon computer is asking a person and then sending the result back to the service subscribing third party. My point is that a library that wanted to FRBRise its database quickly could employ the Mechanical Turk instead of waiting to build it own community.

Interestingly Amazon developed the Mechanical Turk initially for internal use, to do much the same thing as I'm suggesting. Amazon had a problem with duplicate records. They realized that many products were virtually the same and could be sold/inventoried as a single product, but were in their database as two items. It was too large a problem to give to one, or even a group of people, so they created a task marketplace, which evolved into the Mechanical Turk. A program would identify similar records and then submit them to the market place as a task. All the Amazon employee had to do to earn a few extra bucks was glance at each record and answer yes or no to the program. If the answer was yes the records were merged, if no the program moved on. All I'm suggesting is that something like the "work set" algorithm replace the Amazon program. Sure it would cost, but looking at how things are priced, not as much as one might think.

3 comments:

Fiacre said...

Thanks for posting this Ian. I found it very informative and it gave me plenty to think about.

I believe your suggestion for using the Mechanical Turk as a solution to the definition of the "work" in FRBR is really interesting and certainly viable.

hjbennett said...

Hi Ian and thanks for your post. I hadn't heard of this FRBR stuff and I learned a lot from following your links. That's a big change for cataloging!

maximum_access said...

FRBR does seem quite the change, I think it's the first and best real attemp to bring cataloging into the 21 century. One of the reasons I find it so interesting is the way it seems to open up cataloging the way web 2.0, ajax and api's have opened up the web. No longer is cataloging defined as the act of a single person or even the simple process of transcription. I'm not suggesting everything FRBR has to be social, but that "mashups" of tools and techniques are now a real possability. I also find FRBR exiciting because it seems to embrace that with new technology cataloging can begin to address the full content of a work to a far greater degree.