VIVO: Enabling National Networking of Scientists

As we approach the six month mark (on March 25), I thought this would be a good time to give an update on our NIH-funded VIVO project. First, some quick background: VIVO was initially developed by Jon Corson-Rikert at Mann Library in 2003 to help both current and potential faculty, grad students, and others to navigate all the people, programs, departments, and fields that comprise life sciences at Cornell. Brian Lowe and Brian Caruso, also at Mann, then worked with Jon to re-implement VIVO using semantic web technologies. This brought a new level of flexibility, and the potential to integrate VIVO profile information with the larger semantic web. Simultaneously, Medha Devare of Mann served as the VIVO evangelist, encouraging adoption both within and outside of Cornell. The Cornell VIVO site now covers faculty, researchers, departments, and programs across all of Cornell. VIVO has been adopted at a number of institutions around the world, including the Chinese Academy of Sciences and the University of Melbourne.

Last year, a group of seven institutions, led by Mike Conlon of the University of Florida, and including Cornell, Indiana University Bloomington, Weill Cornell Medical College, and three other partners, applied to the NIH’s National Center for Research Resources (NCRR) for funding to take VIVO from a system that can support a single institution to one that can support a national network of scientists, allowing researchers to discover relevant projects, research, and potential collaborators from any participating institution. We received the award in September 2009, and we’ve been running hard ever since. Just within Cornell, we’ve hired seven new developers, and we have a couple of more positions that may be filled soon. The overall VIVO effort at the seven institutions has grown to about 75 people, so it’s quite a team.

So, where are we? At the end of January, we released the first version of the new VIVO software and ontology to the seven partner institutions. All the sites are now running VIVO, and beginning to populate it from their own institutional data sources. The initial code release (v0.9) has a very limited distribution while we work to add some more critical features, improve the documentation, and shake out the bugs. We’re planning our next point release for around the end of this month, and it will include the critical ability to make VIVO data available as RDF for use by outside tools and systems. This is an open-source project, so once public releases are available, you’ll be able to use VIVO for any purpose under the terms of the BSD License. You can keep track of our ongoing progress and find out more details about various aspects of the project at our public web site: http://www.vivoweb.org. You can also subscribe directly to the VIVO blog to get the latest announcements.

Here are some of the things to look forward to in the months ahead:

  • We’ve just met with our Technical Advisory Board to understand how to fully integrate VIVO with the evolving tools and ontologies that make up the Semantic Web.
  • Katy Börner and her colleagues at Indiana University Bloomington are applying their visualization skills to the VIVO national network, allowing users to see and explore the web of scientific relationships.
  • We’re working with publishers and other sources to get authoritative publication information for researchers at VIVO institutions, information that can enable both visualizations and recommendation systems for finding potential collaborators.
  • We’ll be opening up the data in the national network to anyone who wants to build new capabilities into the network.

The entire VIVO team is also getting out, presenting on the project and its capabilities, and looking for potential collaborators. If you are a university, company, or other institution interested in developing tools for or being part of the network, please fill out our contact form. In the upcoming months, I will be giving a project briefing on VIVO at the CNI Spring Forum in Baltimore, April 12-13, and I’ll also be presenting a VIVO paper at the Web Science Conference 2010 in Raleigh, NC, April 26-27. Other VIVO team members will also be presenting at a number of upcoming events, please check out our events calendar for more information.

After our first six months, we’re very excited by the progress so far, and now that we are fully “up to strength”, we’re looking forward to all that we can accomplish over the next 18 months.

There’s No Place Like Home

Uris Library, Cornell University

Uris Library, Cornell University

One of the reports on higher ed and technology that I make a point of reading every year when it comes out is The ECAR (EDUCAUSE Center for Applied Research) Study of Undergraduate Students and Information Technology. I read it because it never fails to give me at least one “aha” moment that shifts my understanding of how undergraduates use technology in their academic and personal lives. The 2009 Study didn’t disappoint in that regard, hence this post.

Reading through the report’s Key Findings synopsis, which at 13 pages is a much less daunting task than the 130 pages of the full report, the first statistic that caught my eye was the result on p. 4 that 94.6% of the 30,000-plus students surveyed used their college or university library website at a weekly median frequency. That level of use struck me as higher than I remembered from earlier reports, so I checked the full 2009 report and found on p. 46 that, indeed, student use of their library sites has stayed at about 95% for the last four years.

What I saw next in the full report, a statistic that didn’t make the cut for the Key Findings, was this year’s game changer for me: “the percentage of students who reported using the library website daily has increased from 7.1% in 2006 to 16.9% in 2009.” Daily. Huh. This trend, a twofold increase in undergraduates’ daily use of their library websites, is significant because it adds an important nuance to a near-axiomatic assumption that underlies the library technology mantra of “we need to bring our networked services to places where our users already are.” When we talk about places where our users already are, we almost automatically think of those places as some other places, as sites other than the library’s website. The ECAR Undergrads and IT finding reminds us as library service providers that, though our student users may spend a lot of their web time somewhere else, they are right here in our own web spaces, too, and in fact are here with increasing frequency, which calls upon us to do our best to meet their needs while they’re here.

The ECAR finding on students’ increased daily use of library sites resonated for me, too, because it supports the considerable commitment that the Cornell Library has made to its own website. Cornell Library staff implemented a thorough overhaul of the site in the past year and the site redesign was marked by extensive efforts to understand better how our users engage with our networked services and to let what we learned about that engagement drive the redesign of the site. Further, the Library has earmarked resources for the ongoing improvement of the site, which also will be driven by direct interaction with our users. With the Library’s investments in its website in mind, it’s all the more reassuring for me to see that students throughout academia are more and more often turning to their library websites as a resource.

Google Books Settlement: Who’s Right?

Education is right (Ben McLeod on Flickr - CC BY-NC-SA 2.0)

Education is right (Ben McLeod on Flickr - CC BY-NC-SA 2.0)

Discussion on the Google Books Settlement is getting very hot and heavy, with strong words from both supporters and opponents of the current settlement. As you may know, the Cornell University Library submitted a letter to the court in support of the settlement, although with a request for court oversight of some of the terms. If you are interested in the gory details of all the current arguments pro and con, and who is in which camp, then I will point you to The Public Index, which is doing a great job of accumulating information on the many aspects of the settlement.

In conjunction with the Cornell Library’s letter of support for the settlement, I have put together a set of questions and answers to address some of the issues that have been raised by the opponents of the settlement, and particularly some of the issues raised by the Open Book Alliance. I should make it very clear that I am not a lawyer, and I am certainly not an expert in the technical legal issues surrounding monopolies, so please keep that in mind as you read these points.

Q: Doesn’t the settlement give Google a monopoly over making orphan works available?

A: Due to copyright law, without the settlement or legislation, no one can make the orphan works available at all. By setting a floor for the terms under which orphan works will be available, the settlement may well make it easier to pass legislation that would allow anyone to scan and provide orphan works under the same (or better) terms as Google. In any case, the choice in the settlement is not between Google providing access or more general access to orphan works, it is between Google providing access or no access.

Q: Doesn’t the settlement give Google a monopoly on distributing digitized books?

A: For all but the orphan works, anyone else can negotiate with the copyright holders to get whatever terms to distribute digitized books they can arrange. They do have to be willing to invest, as Google was, in doing the digitization.

Q: Under the settlement, Google can sell institutional subscriptions to universities and others. What keeps Google from gouging universities with extraordinarily high prices for this essential subscription?

A: The subscription grants access to the full text of all the books in Google Book Search, but even without the subscription, users can freely search the entire full-text content of all the books and view amounts ranging up to 20% of the book. In particular, they can use the free services to identify the books and content they need. Instead of the institutional subscription, universities could then either deliver the physical book to the user (as they do now), or pay the one-off purchase price to give the user access to the book (an amount that is almost always lower than the current cost of delivering books to users through Inter-Library Loan). These alternatives put an effective cap on the amount Google can charge for the institutional subscription and still have anyone subscribe.

Q: Aren’t there serious privacy concerns with Google Books usage?

A: Cornell already licenses a large number of research databases for use by the Cornell community. The privacy issues with Google Books access are no different than the privacy issues that research libraries have addressed through agreements with database providers for many years.

Q: Aren’t there serious concerns with the quality of the books and the metadata in Google Books?

A: As with web search, Google is continuously working to improve the quality and accessibility of the books. Where scans are currently bad, they are working to identify and rescan as needed. Generally, Google metadata is as good as that provided by existing research libraries – because it is the metadata provided by the research libraries. In some cases, Google Book Search is making existing metadata problems visible, and providing an opportunity to correct this metadata.

Q: Doesn’t the settlement thwart competition in the emerging e-books market?

A: Google generally only gains any advantage in access to books that are currently out-of-print. In-print books are only available through Google if the author or publisher explicitly gives Google permission and optionally sets a direct price for the book. In the market for books that are actually selling (the ones currently in print), Google has no advantage over Amazon, Barnes & Noble, or many other e-book sellers.

Q: Doesn’t the settlement widen the digital divide by limiting access to digital books in financially hard-hit communities that have budget-constrained libraries?

A: Currently, these communities have no access to these in-copyright digital books. The settlement opens up the possibility for any institution, no matter how small, to have access to a digital library larger than Cornell’s current physical library of close to eight million volumes. This access can be either through a free public terminal available in any library, or through per-FTE licensing that Google is not going to be able to raise too high (or no one will subscribe).

In closing, I’d like provide to a personal perspective on the settlement. My grandparents wrote and held copyright on four different books originally published in 1929, 1931, 1965, and 1968, and all long out of print. I now hold the copyright myself, and I am very interested in making these works available to anyone who might want to read or make use of them. I actually have original publisher’s contracts for three of these books and, based on those, it is not clear to me who currently holds the publication rights on any of them. All had reversion clauses, where if the book goes out of print, the author can request that publication rights revert back from the publisher. All of them do require that the author explicitly request reversion. Moreover, reversion is only possible in cases where the sales of the books had already covered the original advance to the author. I strongly believe that my grandparents would have requested reversion, but I have no actual evidence to support that position. I also have no idea whether the publishers themselves still have any of the paperwork that would definitively settle the issue. The settlement would provide a straightforward procedure to sort out the reversion, control, and distribution issues, without anyone going to court. Moreover, for out-of-print books, the burden of proof is on the publisher to show that rights in the work have not reverted. I strongly suspect that there are many works that are not actually orphan works, but where there is significant uncertainty over the rights. Since these works typically have very low economic value, there is no incentive for people to spend the money to sort things out. The settlement will cut through the fog and answer the question of who’s right distributing these books really is.

The Disruption of Universities – and Libraries

The week before last I both attended and gave a talk at the Institute for Computer Policy and Law. There was an interesting thread on how IT is disrupting universities, and university libraries, that ran through a number of the presentations. For this post, I’m going to draw on material from three of those presentations: Randy Katz on The Other Policy Challenge: Scholars, Scholarship, and the Scholarly Enterprise in the Digital Age, James Hilton on IT Policiy for CIOs: What They Want and Need to Know, and my own talk on Wrenching Change: Why a Research Library Needs a Chief Technology Strategist. The first two talks are available online from Cornell’s streaming media server – just click on the links.

James Hilton did a nice job of framing the disruptive impact of IT, and he identified three critical effects:

  1. Unbundling: This is the disruption of long established business practice. Newspapers, music, and publishing are three industries where IT has destroyed the vertically integrated business models that sustained them. The Internet, where the marginal distribution cost of digital media is essentially zero, means that payments for paper newspaper subscriptions and the physical distribution of music albums can no longer support the large fixed costs inherent in the existing models. As Hilton pointed out, there is nothing more bundled than higher education.
  2. Commoditization: This is where price becomes the only significant factor in a product. Here, new technologies have been able to ride huge growth curves. Email service is already a commodity, and Amazon EC2, S3, and other Software as a Service (SaaS) providers are rapidly becoming commodities. Music companies and musicians are trying to get away from the commodity 99¢ MP3 through concerts, limited edition albums, and other high margin products. In the higher education space, both Kaplan and the University of Phoenix are seeking to commoditize the delivery of instruction. To compete, universities must build on and emphasize their unique advantages in face to face learning, social collaboration, and social/physical interaction. They need to get the basics done much more efficiently.
  3. Consumerization: This is the process of bringing technology to consumers at a level that everyone can use. Examples abound: Facebook, YouTube, GMail, and Flickr all efficiently and effectively deliver services to very large numbers of consumers. None of these services provide user training, helpdesk support, or consulting. People just use them. Universities and libraries both need to look at how to leverage existing consumer services and how to provide consumer level services that are not expensive to use or to support.

Randy Katz posed the provocative question “Will the digital revolution mean the end of traditional higher education?”. He pointed out that technology was already transforming scholarship: it can now be done anytime, anywhere, and any way you want. Digital resources can end the “busy-ness” of scholarship: going places, searching for records, transcribing, assembling, and so on. Scholarly communication used to be slow, and now it’s instantaneous. Technology has truly liberated scholarship. When it comes to education, there is also a huge transformation. Randy also cited both the University of Phoenix and the death of traditional newspapers as warning shots across the university’s bow. Riffing on a question from Tracy Mitrano, he asked “In the face of the cloud, what is the glue that holds a campus, a scholarly enterprise, together?” He suggested that the successful university of the future may not be a place, but rather an idea instantiated in a physical and virtual architecture. Randy also emphasized the need for universities to identify and build on their unique characters and operational philosophies.

In my own talk, I looked at some areas of disruption that are particular to the research library. The most obvious is the non-locality of digital information resources. It’s no longer necessary for library users to come to the physical library for access to information – now, almost everyone starts with Google. We’re faced with the commoditization and consumerization of information discovery and delivery. In this environment, how do we make sure that all our unique resources are actually discoverable on the net, and that the Cornell community can discover and obtain all the resources that we hold and pay for?

The second challenge I focused on was how to manage academic technologies in the face of tumultuous change in IT. Almost every institution has a different allocation of responsibility for academic technologies among central IT, the library, and other campus organizations. Typically, this is the result of personalities and historical artifact. To get the right division of effort, it is critical to identify each group’s core competencies and match those with the necessary support functions. In general, central IT and the library will divide responsiblity somewhere near the middle of the axis running from IT to scholarship, although that can be a hard line to draw.  For the library to address its academic technology responsibilities it must: 1) provide disciplinary understanding and expertise; 2) become fully aware of and engaged with modern IT tools and services; 3) ensure access to and the long-term preservation of information resources; and 4) collaborate closely with campus experts in IT.

The third area I discussed was the preservation of the scholarly and cultural record. Digital technology brings with it some huge preservation challenges: the diffusion of information resources; the need to manage research datasets; and the need to preserve information within and out of the cloud. As a library, we need to embrace the curation of web information resources, support scholarly communication, and collaborate with other research libraries to ensure the efficient provision of services. Like universities, research libraries will need to focus on their own unique areas of strength, and look to increasing efficiency or outsourcing commodity services. In the face of hugely disruptive forces, we must figure out how to deliver, and pay for, the critical services that research libraries provide:

  • Preserving the scholarly and cultural record
  • Ensuring open access to all knowledge – even when it is controversial
  • Enabling users to discover the information that they seek
  • Supporting creativity, scholarship, and critical thought

In his talk, James Hilton drew a nice distinction between services that are essential and services that are strategic. Running the university payroll  is essential, but it is not strategic. We all need to look at our activities and figure out how to provide what is essential for the lowest cost and at the greatest efficiency, and how to invest our attention, energy, and resources in the things that are strategic.

CUL on Flickr (not quite Commons)

The Nunnery (East Façade of East Wing), Chichén Itzá

The Nunnery (East Façade of East Wing), Chichén Itzá

Through the efforts of CUL’s Library Outside the Library group (proprietors of CULLabs), the Library has now established a presence on Flickr. Through the efforts of Susette Newberry, Dianne Dietrich, Baseema KrKoska, and the rest of the LOL team, over 1000 images from the A.D. White Architectural Photograph Collection are now up on Flickr. The plan is that these photos will eventually go in the Flickr Commons, joining collections from the Library of Congress, the Smithsonian, and a number of other institutions. Unfortunately, Yahoo has put a temporary hold on adding new institutions to the Flickr Commons, so the decision was made to go ahead and release the photos anyway. All these images have no known copyright restrictions in the United States (your mileage may vary in foreign countries). Under the Library’s new public domain image policy, you are free to do as you wish with any of these images, although the Library does request credit so people will know where to find the originals.

The images come with some useful metadata, including things like the date of the building, date of the image, architect (where known), and the location. You can use Flickr’s map view to see the distribution of the buildings across the world, and the individual sets of photos are divided up by country. We strongly encourage comments on these images, particularly if you can provide additional information about the subject matter of the photograph. We’ve got a couple of examples of comments that show modern day images alongside of the originals: Paris–Hôtel de Cluny and New Louvre, Pavillon de Rohan (just scroll down to the comments to see the comparison). We’re hoping that our move to the Flickr Commons happens soon, since that will greatly increase the visibility of these materials, and it should increase the comments and tags.  For some of the Library of Congress images, the public has added significant information and links to sources: one example is the image titled “Jones Barn where dynamite was found“. You can read more about the LoC experience with their Flickr pilot on their site.

The Cornell University Library has a signficant number of digital collections, and a wide range of public domain materials. The LOL group will be looking at other collections that we can add to Flickr, and there are a number of efforts at CUL to use non-traditional channels to make all our materials more accessible and available to researchers, students, and the general public. You’ll find a large and growing number of books from CUL available at the Internet Archive, as print-on-demand copies through Amazon, and on Google Books.

You can find CUL on Flickr at the easy-to-remember web address http://flickr.com/photos/cornelluniversitylibrary. I’ll leave you with a few of my favorite images from the A.D. White Collection:


Collection of Grotesques from Reims, France

Collection of Grotesques from Reims, France

Interior, Saint Sophia (Hagia Sofia)

Interior, Saint Sophia (Hagia Sofia)

Salisbury Cathedral, West Façade

Salisbury Cathedral, West Façade

e-Textbooks and the Amazon Kindle

Kindle DX via engadget.com

Kindle DX via engadget.com

This morning, I was forwarded a query from a Cornell undergraduate, noting the impending announcement tomorrow of a new, larger screen Amazon Kindle and linking to a Wall Street Journal story on its potential use as an electronic textbook. He suggested that Cornell should consider signing up as one of the universities making this device available to their students. I thought that my reply might be broadly interesting, so here is a slightly revised and expanded version.

This is an area of very active development and of great interest for the Cornell University Library and Cornell as a whole. In addition to the larger Amazon Kindle, expected to be announced tomorrow, active competitors in this area include the Sony e-Reader, Plastic Logic , and potentially a larger tablet-style iPhone/iPod. At this point, it is not clear which devices are most likely to succeed both as valuable options for students and faculty and in the marketplace.

When the Library recently reviewed the e-Textbook issue, we had the following concerns with currently available devices:

1) More and more textbooks are using color illustrations. The e-Ink technology used by the Kindle and others will eventually be able to display color, but it’s likely still a ways off. Ideally, the devices should also support animations, videos, and other A/V content. Again, the expectation is that e-Ink refresh rates will increase, but they’re not very fast now.

2) Licensing issues and availability: Textbooks come from a wide range of publishers. Amazon has done well at licensing popular books for the Kindle, but available titles are still a small fraction of the total books available. At this point, it’s not clear which textbooks will be available on the Kindle. There are also concerns about the openness of the Kindle, and its ability to deal with non-Amazon materials (for example, some of the new open textbook initiatives such as http://www.flatworldknowledge.com/). Tim O’Reilly discussed the openness issue in a Forbes article.

3) Digital rights management: Unlike a physical textbook, used eBooks typically cannot be transferred to other devices. This means that students can’t recoup costs by selling books at the end of the semester, and can’t save money by purchasing a used text. It’s possible that the pricing will reflect this, but it’s certainly not clear at this point that it will.

In the long term, I believe that e-Textbooks will make sense economically, practically, and academically. After a decade of false starts (I still have both a SoftBook and a Rocket eBook lying around somewhere), we’re getting very close to a tipping point. The constant network connectivity of the Kindle, and the availability of the Kindle iPhone app, make it a very strong contender, but the field is still open.

Once the new Kindle is announced and the details are clear, we will certainly look into whether a pilot project here at Cornell might make sense. The Library is constantly reviewing and discussing the opportunities in e-Textbooks, e-Books, and mobile devices. I’ve posted on related issues previously on this blog, both on Mobile Devices and CUL and on eBooks Heat Up.

Since my reply to the student this morning, ReadWriteWeb has posted some similar questions and observations: Would Students Even Want a Kindle for Textbooks?. In addition to some of the issues above, they also point out that students already have laptops, which provide most of the network, storage, and display capabilities to handle e-Textbooks. The Library also considered this earlier, but I believe (and it’s echoed in the ReadWriteWeb article comments) that the “closed garden” aspect of devices like the Kindle and iPhone make it a much more comfortable place for publishers. It’s hard to say whether or not the Kindle will cross the threshold for students to haul around yet another expensive electronic device.

I’ll plan to post an update on the Kindle DX tomorrow if the actual news raises any other significant issues.

Mobile Devices and CUL

iPhone with eBook and information apps

iPhone with eBook and information apps

The use of mobile devices for research, learning, teaching and creative expression is growing very rapidly. The most recent Horizon report from Educause and the New Media Consortium identifies Mobiles as a key educational technology trend with a time-to-adoption of one year or less. We’re exploring this issue for the Library and figuring out where we should be investing effort to support our mobile device users.

Last Saturday (4/18), the Cornell University Library Advisory Council met here at Cornell. As part of that meeting, Beth Anderson of Audible (now part of Amazon) and I gave a presentation on Mobile Devices and CUL. Here are some quick highlights:

• 29% of current college students read eBooks, as compared to 25% of 18-29 year-olds; 19% of 30-44 year-olds; and 14% of 45-64 year-olds [Source: Simba Information: Trade E-Book Publishing 2009].

• At Cornell, the use of powerful mobiles is growing, but it’s still early. Students mostly use laptops for information access.

• Within CUL, our current main Library web page is not very mobile-friendly (but we’re working on that).

• The use of mobile browsers at the Library web site increased by a factor of 75 times in the past semester, but it still represents less than 0.2% of Library web hits. Of that use, 55% is from iPhones/iPods and the rest from other mobile phones and devices.

• There is a new beta “Text a Librarian” service that allows users to send text messages to get answers to quick Library logistics and reference questions. You can find the details at the CULLabs web site.

You can also download and view the PDF version of our presentation: Mobile Devices and CUL (2.5MB). In it we gave a quick overview of the mobile landscape outside of and within Cornell, and then posed a series of questions for discussion by CULAC. The actual discussion mostly focused on the immediate importance of this area. Yes, growth is extremely rapid, but it’s still at a fairly low level. Yes, there are a number of possibilities, but is this the best place to spend the Library’s constrained resource? If I could summarize the group consensus, it would be “Take the relatively easy, inexpensive, high-payoff steps to support mobiles now, and actively monitor mobile use and demand for mobile services to decide what to do in the future.”

So, what is the Library doing now? The Library web group in cooperation with the Library Outside the Library team (LOL) is looking into making the main Library web site (based on Drupal) more friendly to mobiles. The New York Public Library mobile site is a great example of what can be done. LOL is also looking at a possible student project to build a simple iPhone app to access the Library, along the lines of the one created by the DC Public Library.

Another interesting approach is to build on the mobile-friendly environments that are being built by others out on the web. Flickr has a very nice mobile site, and the LOL group is looking at making Cornell images available on Flickr Commons. As another example, if we link from the catalog to a public domain book at Google Books, then a user can view that book using the Google Books Mobile web interface.

If we’re lucky, and the resource is sufficiently important, we may not need to do anything at all. The arXiv.org physics/astronomy/CS pre-print archive already has a mobile site available. There are also three different iPhone apps to access arXiv – here’s the author’s description of one them – arXiview. If we have sufficiently valuable resources and we make the APIs available, our user community may just develop the mobile-friendly applications themselves.

There may still be a question of how soon mobiles will hit and how hard they will impact, but there is no question that it’s going to happen. Perhaps the iPhone is the tipping point, with a range of applications and capabilities that make it a totally compelling information tool. I certainly use mine for many purposes: constant awareness and updates (Twitterfon, Facebook, Google Reader Mobile), information seeking (Wikipanion, Google), news (WSJ, NYTimes, AP, USAToday), eBooks (Kindle, Stanza, Classics), as well as web browsing, email, and even (occasionally) as a phone.

This past January, Lorcan Dempsey of OCLC wrote an excellent article in First Monday: “Always on: Libraries in a world of permananet connectivity”. Here’s a quote that captures the challenge that libraries face: “[T]he library will have to meaningfully synthesize a range of products and services from multiple sources, specialize them for particular users and uses, and then mobilize them into a personalized, socialized individual user experience.” This is already happening out there in the big, wide world of the web, and if CUL is to remain relevant and useful to our scholarly community, then it will have to happen here as well.

IT Policy and Security

Safeguarding Your Computer

Safeguarding Your Computer

This post will be a bit less “fun” than most, and it’s also pretty specifically focused on Cornell Library issues. If you are not that interested in the IT policy part, I would still encourage you to check out the new handbook on Computer Security at Cornell, since it will be useful to all Cornell computer users.

Part of my job as CTS is to represent the Library on the university-wide IT Managers Council. This makes me the primary point of contact between Cornell Information Technology (CIT) and the Library on a range of both operational and policy issues. As many readers will know, CUL is a rather diverse organization, with a number of units that operate pretty independently. It seemed to me that it would be valuable to clarify who has both responsibility and authority to act on IT policy and security issues for the Library. I also wanted to make sure that there were clear channels of communication between me and IT staff at the Library who have operational responsibilities.

To that end, with the assistance of senior IT managers at the Library, I authored a document on IT Policy and Security Responsibilities for CUL. Here are some key points:

On IT Policy Decisions: A committee of three people, the Library IT Policy Group (LITPG), will make IT policy decisions for the Library: the head of DLIT (currently Oya Rieger), the head of Mann ITS (currently Jon Corson-Rikert), and the CTS (currently Dean Krafft). If there is any policy issue on which these three do not agree, then that issue will be referred directly to the Library Executive Group for discussion and a decision. In particular, the LITPG accepts the delegation of responsibility from the University Librarian to fulfill the unit head responsibilities for the Library under University Policies Volume 5 on Information Technologies (currently Policies 5.1-5.10).

On communications between CIT and CUL: The Chief Technology Strategist will serve as the Library spokesperson on the IT Managers Council and as the initial point of contact for communication of IT policy and operational issues between CIT and the Library. The head of DLIT will identify a DLIT staff member who will serve as the primary communications conduit between the CTS and the IT staff throughout the Library. This staff member will maintain mailing lists, wikis, and other tools to ensure that information from the ITMC and CIT is reaching appropriate Library staff, as well as to assist the CTS in representing the Library’s needs and interests to CIT and the ITMC. Currently, the head of DLIT has identified the Unit Security Liaison, Oliver Habicht, to serve in this communications role.

On the IT security side, Cornell has been working hard to improve its procedures and policies. Several years ago, under the guidance of Steve Schuster, the Director of IT Security, a Security Council was set up, with a Security Liaison from each unit. In my former life as Director of IT for Computing and Information Science, I served as the CIS Security Liaison. For CUL, Oliver Habicht currently serves as the Security Liaison. The Security Council has worked to develop new security requirements, including both Baseline Requirements and Requirements for Confidential Data. This past January, the University officially approved a new policy on the Security of Electronic University Administrative Information, which incorporates these requirements.

IT Security issues are of great importance for Library staff. New York State Law guarantees the confidentiality of the records of Library users, and preserving our users’ privacy and freedom to read, study, research has always been a core library value. It is important that you do your part to ensure that confidentiality by helping to maintain the security of the IT systems you use in your work. The CIT Security Office has released a new handbook Computer Security at Cornell: Secure Your Computer On and Off Campus, and I encourage Library staff to read it. It is a bit long, but it’s very thorough and quite readable. It will help you protect yourself from IT security problems both at work and at home.

Newspaper Armageddon

Along with many others, I have to point to Clay Shirky’s great post “Newspapers and Thinking the Unthinkable”. Here are some key quotes:

When someone demands to know how we are going to replace newspapers, they are really demanding to be told that we are not living through a revolution. They are demanding to be told that old systems won’t break before new systems are in place. They are demanding to be told that ancient social bargains aren’t in peril, that core institutions will be spared, that new methods of spreading information will improve previous practice rather than upending it. They are demanding to be lied to.

Society doesn’t need newspapers. What we need is journalism. For a century, the imperatives to strengthen journalism and to strengthen newspapers have been so tightly wound as to be indistinguishable. That’s been a fine accident to have, but when that accident stops, as it is stopping before our eyes, we’re going to need lots of other ways to strengthen journalism instead.

For the next few decades, journalism will be made up of overlapping special cases. Many of these models will rely on amateurs as researchers and writers. Many of these models will rely on sponsorship or grants or endowments instead of revenues. Many of these models will rely on excitable 14 year olds distributing the results. Many of these models will fail. No one experiment is going to replace what we are now losing with the demise of news on paper, but over time, the collection of new experiments that do work might give us the journalism we need.

Clay isn’t the only one making this observation. LibraryThing’s Tim Spalding cites some of the evidence, and hopes it won’t be true. The San Francisco Chronicle hovers on the brink. As newspapers start vanishing in print, or vanishing altogether, we all need to think about what happens next, and what we should do right now. How do research libraries preserve news, journalism and culture when it is fragmented and scattered in a million pieces across the web? What do we do when we can’t just subscribe to a nice simple set of newspapers, whether paper or electronic? How do we avoid facing future scholars with a gaping hole in the cultural record?

How Many Orphans?

There has been a great deal of recent discussion about the role of orphan works in the Google Book Settlement. In particular, it was a major element of the symposium on “The Google Books Settlement: What Will It Mean for the Long Term?” at Columbia Law School last Friday. There have been a number of good summaries and reports from that meeting. Our own Peter Hirtle reported on his observations and notes on the LibraryLaw blog: Part 1 on the keynotes (Mary Beth Peters, Register of Copyrights, and Randal C. Picker of the University of Chicago Law School – his slides are available here), and Part 2 on the afternoon panels (“The Future of ‘Books’”, “Authors and Incentives”, and “The Public Interest”). Peter Brantley twittered the symposium using the tag #gbslaw and has now posted “The Orphan Monopoly” on his blog. And Paul Courant has a post up on “Orphan Works Legislation and the Google Settlement”.

To very briefly summarize the issues: By getting the suit between Google and the authors and publishers (represented by the Association of American Publishers and the Authors Guild) certified as a class action, the proposed settlement will bind any author or publisher of a work who does not explicitly “opt out” of the settlement. The books covered include basically every non-public-domain book published before January 2009. “Orphan works” are those books where the rights holder is unknown and presumably either unaware or uninterested in any rights they have in the work. By definition, these rights holders will not “opt out”, nor will they register in response to the notification now taking place as part of the class action process. The settlement will create a “Book Rights Registry” (BRR), which will collect and distribute royalties for the use of books that fall under the settlement. Google gets specific rights to make use of all the books that fall under the settlement, in return for paying royalties to the BRR. The BRR can also negotiate with other groups to give them access to books on behalf of all the authors and publishers who have registered with them. However, the BRR cannot negotiate with others on nor license rights for those who have not registered with it – the orphan works.

So, what are the issues here?

  • It is a brilliant use of a legal mechanism, the class action lawsuit, to accomplish a purpose for which it was not really intended: rather than compensating people for past injuries, it instead binds them going forward, including all the impossible-to-identify rights holders for orphan works.
  • A major point raised at the Columbia symposium, and by a number of others, is that this bypasses the legislative process for dealing with orphan works. There was previous proposed legislation to deal with orphan works, which involved public hearings with input from many stakeholders. Arguably, a private lawsuit is the wrong way to set public policy.
  • As Randal Picker put it (as quoted by Peter Hirtle): “[T]he settlement allows Google and the Registry to turn orphan works into a private public domain.” Essentially, Google gains a monopoly in making orphan works available, since the BRR has no ability to negotiate on making those works available to others.
  • Moreover, the financial incentives surrounding orphan works are perverse: The BRR can use revenues from the orphans to fund its own operations, and the portion of subscription revenues (e.g. where universities like Cornell subscribe to get access to all the works) attributed to orphans is simply distributed to all the other rights holders. Quoting Peter Brantley, “The essential problem is that the settlement parties have a vested interest in maintaining a monopoly over access to orphan books.”

Getting back to the title of this post, how big a problem is this? How many of the books under the settlement will turn out to truly be orphans? This is really hard to figure out, and the availability of Google Books itself may dramatically change the equation. Under the previously proposed orphan works legislation, anyone wanting to treat a work as orphan had to conduct some sort of reasonable search for the rights holder. Let’s consider a specific example: Sky Larking: The Romantic Adventure of Flying by Bruce Gould was published by Horace Liveright in 1929. The author is dead, the publisher went out of business in the 1930s, and the book was never reprinted. However, the rights holder, who had no particular incentive to make himself known before, is about to register with the Google Books Settlement. It is true that I have a particular interest in both my grandfather’s book and in the Google Books Settlement, but I suspect that a very large number of “hidden” rights holders will suddenly show up. Instead of asking whether or not a “reasonable search” would have turned up the rights holder, the question now becomes, “Is there anyone in the family who is aware, or may become aware, that their parent, uncle/aunt, grandparent or great-grandparent wrote a book?”. As Google Books itself becomes more complete and more visible, it could be that a very large number of rights holders will slowly surface. Family history and genealogy is one of the most popular hobbies in America. It seems to me that discovering and registering authorship and ownership for orphan works may be high on the list of many “family historians”. [As an aside, it would probably have been pretty easy to track me down as the rights holder for Sky Larking with a few creative Google searches - I'm pretty visible on the web.]

That said, there are still going to be a lot of orphans out there. Peter Brantley does a calculation based on the $45 million that Google has set aside to compensate rights holders for scanned books and comes up with an estimate of 4.75 million orphan titles. Moreover, that only considers the roughly 7 million volumes that Google has already digitized, and the eventual total will be much higher.

So, what should be done about the settlement and the orphan works? Peter Hirtle reported from the Columbia Law Symposium that while there was a strong sentiment that the class action settlement was the wrong process by which to set public policy, there was a general acceptance that the settlement was better than the status quo. He continues:

No one was saying the court should reject it and tell the parties to start over. Yes, the class may be too large and the mechanism too crude, but we created this problem when we abandoned formalities, lengthened copyrights, and started treating every copyrighted item in the world like it was a Disney movie. Given this procrustean bed we have made for ourselves, the settlement may be our only way out. Yes, Congress should create a compulsory license authorizing the use of out-of-print books – but don’t hold your breadth waiting for that. In the interim, the settlement may be the best we can hope for – even though it has the potential to radically alter all of our worlds.

While there are those such as Brewster Kahle who feel that the settlement should be thrown out, and we should try again at legislation, I’m inclined to agree with Paul Courant:

But there is an obvious solution, one that was endorsed at the Columbia meeting by counsel for the Authors Guild, the AAP, and Google: Congress could pass a law, giving access to the same sort of scheme that Google and the BRR have under the Google Settlement to anyone. And they could pass some other law that makes it possible for people to responsibly use orphaned works, while preserving interests for the missing “parents” should they materialize. [...] Given that the parties to the suit, libraries, and the public would all benefit from such legislation, it should be a societal imperative to pass it. I look forward to AAP, the Authors Guild, and Google lobbying and testifying in favor of such legislation. I’d be happy to be there, too.

The settlement itself sets a “floor” on access to these works, and it’s one that the AAP and the Authors Guild have already agreed to. In theory, that should make it easier to agree on and pass legislation that would grant groups other than Google access to the orphan works. Hopefully, it will be possible to overcome the perverse financial incentives that the settlement provides to the BRR and rights holders to keep the orphans locked up.

The Google Book Settlement will make an amazing collection of works available in digital form. The settlement is not perfect, but it is a huge improvement over the current situation where almost all of these books remain locked up in physical form, hard to discover and hard to access, on the shelves of our libraries. The opportunity to find new loving readers for all those orphans, however many there are, is too good to pass up.