Friday, December 16, 2011

How to Dig for Book Data Treasure

To me, surest indicator of an impending doom for book publishing is hearing a publisher cite the advertising of Attributor, an anti-piracy solutions company, as if it were science. It's not the attitude towards piracy that bothers me, that's entirely sensible. It's the implied devaluation of honest data that depresses me.

There's hope though. I've gotten to know quite a number of people throughout the reading ecosystem with whom I can use the word "data" as high praise, roughly equivalent to the word "gold". If you're reading this, chances are you're a member of this secret society, and what follows is a sketch of a treasure map.

In a recent post, I promised to suggest ways that we might measure the effects of library ebook lending on book sales. If you think about it, there are many parallels between attempting such a measurement and previous studies that have tried to measure the effect of ebook piracy on book sales. Unfortunately, the only objective study I know of was a small study done by Brian O'Leary, and the effects observed in that study were small and in a direction counter to popular narratives (and thus rarely noted in the sort of presentations that cite Attributor advertising).

In that study, O'Leary looked for time-domain correlations between sales figures for books from two publishers and the appearance of the same books on BitTorrent. A similar study focused on library Lending could be much more compelling, because library circulation data is a much more direct measure of distribution than any sort of torrent tracking, and librarians are much better than pirates at sharing data.

With the cooperation of booksellers, library circulation and holdings could be compared and correlated to store-by-store sales. For example, you could look at a book that's held in a significant fraction of libraries and look for correlations (positive AND negative) between areas where a library is circulating the book and stores where the book is selling. You've have to remove regional and demographic variance, of course, but with enough data, almost anything is possible.

With the cooperation of a large publisher, rigorous experiments could be done. Scientific experiments derive rigor from the use of controls. To prove that lending influences sales, it's not enough to do lending and look for sales. A rigorous experiment would have both a trial where books are lent and an identical trial where the same books are not lent.

One way to control a lending experiment would be to make a random selection of a publisher's catalog available for lending. Imagine if Penguin had worked with the library community on an experimental withholding of a random part of its catalog from Overdrive. The sales could be analyzed for patterns and trends.

It's important that data analysis of this sort be done objectively by researchers with integrity. In any large collection of data, it's possible to focus on data which supports one narrative over another. If lending-sales studies were done, my guess is that some types of books would show correlations very different from others.

I've used the word "cooperation" several times already. I'm not so naïve as to think that data sharing will materialize out of thin air. Perhaps the sort of eco-system wide organization envisaged by the same Brian O'Leary could be the vehicle to make data treasure digging possible. Opportunity in Abundance for the win!

Enhanced by Zemanta

Monday, December 12, 2011

SOPA Could Put Common Library Software in the Soup

The "Stop Online Piracy Act", or SOPA, is promoted as something that will... stop online piracy. So I was a bit surprised when I learned how it's supposed to work. A key provision of SOPA will shut down "notorious" websites by setting up a national web filter based on domain names. I'm sure the pirates had a great laugh about that one. They'll be the ones benefiting while the rest of us figure out how to avoid collateral damage. Members of Congress should consult the nearest available 14-year-old on the ease of web filter evasion: school teachers in my town routinely access their filter-blocked Facebook accounts by asking students to show them how it's done.

Rerouting domain names to alternate IP addresses is pretty easy to do, and can be very useful as well. One type of software used to accomplish this is called a "proxy server". It's called that because it acts as your web browser's proxy in requesting files from a web site. For example, after connecting to a proxy server in Stockholm, my requests for web pages would appear to issue from a computer in Sweden instead of from my computer in New Jersey.

Libraries often use proxy servers to simplify IP authentication of their networks to digital information providers. When an academic library buys access to a database, for example, they'll give the IP address of their proxy-server to the database provider, which then puts the IP address on an "allow" list. Then everyone at the school accesses the database through the address of the proxy server. In effect, those proxy-authenticated users circumvent the IP address-based filter that blocks unauthorized users.

Passage of SOPA would inevitably spawn the creation of a network of proxy servers hosted in countries that reject filtering of the internet. Users in the US could then connect transparently to  blocked sites by connecting through a constantly shifting network of proxy servers. The key to that connection would be a Proxy Auto-config, or PAC file- essentially a mini DNS file installed in the user's web browser software.

SOPA contains provisions that allow the US Attorney General to
bring an action for injunctive relief against any entity that knowingly and willfully provides or offers to provide a product or service designed or marketed for the circumvention or bypassing of [domain name blocking] and taken in response to a court order issued pursuant to this subsection, to enjoin such entity from interfering with the order by continuing to provide or offer to provide such product or service.
Proxy servers meet the condition of being designed to route around filters and therefore fall into the category of services that could be subject to injunctive action under SOPA. The proxy servers most frequently used in libraries are OCLC's EZProxy and the open-source software known as SQUID, but there are many others in use.

In particular, SQUID makes use of PAC files, and thus could be vulnerable if the Justice Department decides that PAC files make it too easy to evade SOPA blockages. Conceivably, the Justice department could force browser developers to omit support for PAC files, or perhaps to restrict their transmission.

Similar concerns about important software have been raised by Jim Fruchterman on behalf of Benetech, a non-profit that among other things, provides ebooks to the reading disabled. Benetech is also one of the largest developers of software for human rights activists around the world. They operate TOR servers designed to foster anonymous communications. On Beneblog, Fruchterman worries that Benetech services could be impacted by SOPA. In response, a commenter signing in as "Copyright Alliance" argues that such action would be unlikely because "The State Department is strongly committed to advancing both Internet freedom and the protection and enforcement of intellectual property rights on the Internet." Too bad it's the Justice Department that gets to decide which services constitute circumvention.

I don't think that libraries will have their proxy servers taken away anytime soon, even if SOPA is enacted. But it's likely that the widespread development of SOPA-circumventing infrastructure would degrade the ability of rights holders to find and prosecute copyright violators. Knowledge of the actual locations of unauthorized files would by hidden offshore in distributed proxy servers, completely out of the reach of US law enforcement. The "file lockers" of today would dissolve into ungraspable bit vapors, and the online piracy problem would just get worse and worse.

There are many ways to address the online piracy problem- too many to list in this post. My own company is working on a piracy-neutering business model for ebooks. I don't know enough to evaluate the possible effectiveness of the payment and advertising network components of SOPA. But it appears to me that from the technical point of view, the internet filter component of SOPA will be a charm of powerful trouble, like a hell-broth, boil and bubble.

  1. @amac has a good post on SOPA's scope issues, as well as links to other articles.
  2. I focus here on SOPA, but there are similar issues with PROTECT IP, as described by Steve Crocker and 4 other prominent internet engineers.
  3. The Crocker paper describes a number of other ways that domain name filtering might be circumvented. These include using replacing .hosts files on the user's computer (similar to PAC file installation) and switching the user to using a non-filtered DNS server. Apparently this is done transparently by some types of computer malware. This can only end badly.

Enhanced by Zemanta

Friday, December 9, 2011

Book Lending Ignorance

To what degree does library book lending complement book sales, and to what degree does library lending substitute for book sales? I don't think anyone knows for sure. (Well maybe Amazon, but they're not telling.)

With over 40 billion dollars per year of sales at stake, you would think that the US book publishing industry would want to know as much as possible about how those sales are generated. Since US public libraries circulate more items than US bookstores sell, the industry needs to understand the role of libraries in getting people to read and purchase books. Is it small or big? Does the existence of libraries promote sales or hurt sales? How do the equations change when books become digital?

Publishers do a pretty good job of compiling sales data, and they spend a lot of money to figure out what books are selling and who's buying them. According to BookStats, a cooperative study by the AAP and BISG, Americans bought an average of 7.32 books in 2010.

On the library side, there's a bunch of interesting data. IMLS has been compiling a wealth of data about the footprint of public libraries, which is why I can tell you that the average American borrowed 8.1 items from public libraries in 2009. Library Journal has recently published the first installment of results from a fascinating survey of library patrons. (Aside: this study should be made available in every library!) They find that 46% of respondents use the public library less than 2 times per year.

The LJ Patron Profiles survey shows a strong relationship between library use and book purchasing. For example, over half of survey respondents report buying a book by an author whose works they'd previously borrowed from the library. That's a huge number, considering that 20% of respondent never go to the library, period. At the same time the survey indicates a competition between reading and borrowing. Respondents who report that they've decreased their use of libraries buy 12.18 books per year, while those who've increased their library usage buy only 10.9 books per year. What we can't tell from the data is cause and effect. With the recession having a wide impact, who's to know whether the folks showing up more at libraries might buy even fewer books if the libraries weren't around!

It costs about 11 billion dollars a year to run public libraries in the US, and libraries work hard to demonstrate their value to the communities that support them. They compile data to measure their activity and the community's return on their investment in libraries. These studies assign much of the benefit of library spending to substitutional activity. For example, a survey by Denver Public Library determined in 2009 that it saved its community $105 million based on the cost to use alternative sources of information, and delivered an additional $5 million by avoiding "lost use", activity that wouldn't have occurred if the library did not exist. (See Public Libraries- A Wise Investment (PDF, 1.4 MB) from Library Research Service)

Do libraries really believe that 91% of their circulations would have resulted in purchases if they didn't exist? There's no hard evidence anywhere that that's true. Every librarian can tell you about patrons who loved a book so much they went and bought the whole series, but there are also users who never buy a book they can get in the library. And what about those readers who never go to the library? Surveys are a cheap way to collect data, but they often don't reflect the real behavior of the people surveyed.

So much is unknown, and so much is to be gained by knowing more. What hasn't been done, as far as I know, is to try to compare and correlate hard data on book sales and library lending in any meaningful way. In my next post, I'll describe how a cross-industry cooperative approach to book data collection and analysis might provide some light amid the gloom of the reading industry's winter solstice of understanding.

Friday, November 25, 2011

It's Not About Libraries, It's About Amazon

When Douglas County (Colorado) Libraries decided to put "Buy this book" buttons on their online catalog pages (example), the response was strong. In just 11 days, the buy buttons had garnered almost 700 clickthroughs. According to Library Director Jamie LaRue, the library is putting buy links direct to publisher-supplied urls when they are provided (often to Barnes and Noble).  Of the 700 clickthroughs, 389 went to Amazon and 262 to Tattered Cover, the independent bookstore with 3 locations in the Denver area. In isolation, this data seems to be strong support for the notion that a digital presence in libraries can support sales of books. The withdrawal this week by Penguin from library ebook lending platforms (such as Overdrive) would seem to be a profoundly shortsighted move.

Viewed from a big six publisher's point of view, the situation looks different. If Douglas County's book buying rates match the rest of the country, its residents would purchase 2.1 million books per year, almost 6,000 books per day. The 7.1 million items circulated by Douglas County Libraries in 2008 would present as an attractive market opportunity.

It's hard to know what the bookselling environment will look like 10 years from now, after a transition to digital reading platforms. While some publishers hold out hope that they could play a much larger role in servicing the demand that libraries meet in today's market, it's not libraries that worry them today, it's Amazon. Today's big six publisher sees the Douglas County clickthrough numbers and worries that those 389 library patrons are being captured by Amazon. Amazon is pushing $79 Kindles to those patrons and then effectively owns their book consumption.

The casual observer might not imagine how much of a threat Amazon presents to a big six publisher. After all, Amazon is sending them huge amounts of money. But think about how this might play out. If Amazon, with its proprietary e-reading ecosystem, grows to dominate book sales the way it currently dominates ebook sales, then it will be easy for Amazon to squeeze out the big publishers. Amazon can acquire exclusive content by dealing directly with authors, and is already doing so. They will be able to demand that publishers reduce their margins so that they really are marginal. Publishers would have no choice but to surrender and perhaps die.

The Penguin move should be seen not as corporate verdict on libraries, but as a reaction to Amazon's entry into the library market. When Overdrive was distributing content to libraries on their own platform, the publishers were able to view Overdrive, and libraries in general, as a counterweight to Amazon. But the extension of Overdrive lending to the Kindle flipped libraries into the Amazon column. That's the best way to understand the Penguin decision, though you won't see them saying that.

The recently announced Kindle Owner's Lending Library demonstrates that Amazon, blessed with its trove of marketing data, understands the power of libraries to promote sales. But it also demonstrates that Amazon is not content to leave libraries to libraries. Amazon wants in on the lending action, too.

Bookstore closings and bankruptcies are just the first set of casualties in the war for dominance in the ebook industry, which has only just begun. Institutions with footprints as large as libraries won't be able to avoid cross-fire, or even direct attack. Neutrality won't be an option. The advance of technology doesn't respect the innocence of bystanders.

What's clear to me, at least, is that libraries could do worse than to follow the lead of Douglas County, stepping into the marketplace for ebooks without fear, with eyes open and with server logs studied.

Enhanced by Zemanta

Sunday, November 13, 2011

eBook Markets Need eBook Quality Standards

Yes, the Kindle is UL rated!
Underwriter's Laboratory (UL) issued its first standard, covering "tin clad fire doors", in 1903. It then became easier for architects to specify fire-resistant doors for new buildings, which no doubt was a boon to tin clad door manufacturers, who no longer had to compete with doors made with too-thin tin. The UL® labels now let consumers buy all sorts of electrical products without thinking about whether their new Amazon Kindle will burst into flames in the middle of Maharaja's Mistress.

Think about all the things you didn't have to think about today. If you nuked a mug of water for tea this morning, you probably didn't consider whether the microwave's magnetron would fry you. You probably don't even know that your microwave oven has a magnetron. Our modern civilization is built on being able to not think about these things. Quality standards such as those developed by UL help us to think less, and help marketplaces sell more.

Unfortunately, if you're an avid ebook reader, in 2011 you have to think more than you want to about ebook quality. When Neal Stephenson's new novel, Reamde, came out, early purchasers of the book were dismayed to find that it was rife with typographical errors. (But not the title. That "typo", for ReadMe, is intentional!) Amazon was forced to suspend sales.

I've been watching an important effort on ebook quality. It's worth supporting. The entry deadline for the Publishing Innovation Awards is this week, November 15. Entrants submit ebook files which are evaluated for quality, innovation and design. New this year is the "QED" seal, which is awarded to entrants that satisfy a checklist of basic ebook quality no-brainers:
  1. Front matter: the title does not open on a blank page.
  2. Information hierarchy: content is arranged in such a way that the relative importance of the content (heads, text, sidebars, etc) are visually presented clearly.
  3. Order of content: check of the content to be sure that none of it is missing or rearranged.
  4. Consistency of font treatment: consistent application of styles and white space.
  5. Links: hyperlinks to the web, cross references to other sections in the book, and the table of contents all work and point to the right areas. If the title has an index, it should be linked.
  6. Cover: The cover does not refer to any print edition only related content.
  7. Consumable Content: The title does not contain any fill-in content, such as workbooks and puzzle books, unless the content has been re-crafted to direct the reader on how to approach using the fill-in content.
  8. Print References: Content does not contain cross references to un-hyperlinked, static print page numbers (unless the ebook is intentionally mimicking its print counterpart for reference).
  9. Breaks: New sections break and/or start at logical places.
  10. Images: Art is appropriately sized, is in color where appropriate, loads relatively quickly, and if it contains text is legible. If images are removed for rights reasons, that portion is disclaimed or all references to that image are removed.
  11. Tables: Table text fits the screen comfortably, and if rendered as art is legible.
  12. Symbols: Text does not contain odd characters.
  13. Metadata: Basic metadata for the title (author, title, etc.) is in place and accurate.
Next year, I hope they add a checklist item for typographical errors. If a publisher can produce print with minimal errors, there's no excuse to allow them in digital books. As the Reamde debacle showed, even typos can create significant customer service expenses for retailers.

A few years from now, it's likely that any ebook that doesn't meet these standards will be unsaleable; for now, a QED seal is a great way for publishers to realize the value of making a good digital product, and for readers to be able to think less.


  1. In building, we've realized that we need to give book lovers some assurance that the ebooks they support for ungluing will be of a quality that they will be proud to have contributed to. We'll point to QED as a reference point for the quality we expect from unglued ebooks.
  2. I read the print version of Reamde. I thought the spin-up was Stephenson's best, but there was a lot of carnage as things spun globally out of control.

Enhanced by Zemanta

Tuesday, November 8, 2011

Creative Commons Media Neutrality and eBook Rights after Rosetta v. Random

Here's where it gets complicated.

Not so long ago, book publishers had no idea that there would be such things as digital books. Publishing contracts mentioned nothing about ebooks. Literary agents made sure to keep derivative rights separate, so that translation rights, film rights, stage adaptation rights, etc. for a successful book could be separately monetized.

When ebooks started to become important, the digital publisher Rosetta Books took advantage of the situation, and started acquiring ebook rights to well-known books such as Kurt Vonnegut’s Slaughterhouse Five. This did not please the print publishers at all. Random House, one of the "Big Six" US publishers, took Rosetta to court, saying that their publishing contracts gave them exclusive rights to distribute books, and ebooks were books.

And here’s where it gets REALLY complicated. The District Court ruled against Random House, but narrowly, and the Appeals Court upheld. It wasn't that the ebook's bookness was obvious one way or another.  The courts only refused Random House’s request for an injunction.  Random House had asked the court to order Rosetta Books to immediately “cease and desist” selling ebook editions of Random House books.  Without an injunction, Random House would have had to continue with a lengthy legal preceding to assert its publication rights.

Instead, Random House negotiated a settlement with Rosetta Books, one which allowed some older books to be issued in ebook format by Rosetta. Rosetta got their ebook rights and Random got an undisclosed revenue share, according to Publisher’s Weekly. The details are not public, but the practical result seems to be that if Random House does not want to reissue an ebook of a book based on an older contract, they will allow the author to contract separately with a third party, such as Rosetta.   Open Road Media is a more recent, and more aggressive, ebook "reprinter," and they have also contracted separately with authors and estates for ebook editions, such as the "enhanced" From Here to Eternity; Random House retains print rights only.

Despite this uncertainty, the book publishing industry has managed, for the most part, to avoid destructive legal battles. It seems to be understood by literary agents that ebook rights for works under pre-Rosetta print contracts are to be offered first to the publisher with print rights. While Random House will often waive ebook rights, Harper, S&S, Penguin, Macmillan, Hachette seem to block 3rd party licenses, slowly adding the backlist ebooks to their ebook catalogs, and only if they can get authors to accept the current standard ebook royalties, 25% of net.  If no agreement can be reached on royalties no ebook is published.

For other publishers, the situation is confusing. According to Rosetta, "in England, the agent and author community has been clear for ten years that these backlist electronic rights are owned and controlled by the authors". Smaller publishers will often revert ebook rights because conversion and distribution costs for backlist books make it too expensive to create an ebook only to keep ebook rights.

The effect of the Rosetta v Random non-decision has been that a large number of works whose print rights remain with a publisher have ebook rights which  may be subject to dispute.  Often these books are scholarly works or trade books with little commercial value.

Our goal in building is to work with rights holders to re-license books such as these with the financial backing of book lovers everywhere. These "unglued ebooks" would be "given to the world" under something like a Creative Commons (CC) license. But how can such a license be applied when there is  such uncertainty around ebook rights?

One problem is that the Creative Commons licenses are media neutral. If I release a print book under a CC license, there's nothing in the license to stop anyone from scanning it, turning it into an ebook, and distributing it on their website. Similarly, a CC ebook can be printed and bound, and redistributed with the same license, so long as the other license terms are obeyed.

It's not enough to have ebook rights to release a Creative Commons ebook, you need to have print rights cleared as well! 1 (If you thought this article had reached the zenith of complicationness, you thought wrong.)

If what you're really interested in is ebook rights, then why use a Creative Commons license? With a CC BY-NC-ND license, the allowed noncommercial print uses are probably not very valuable.

Looking at this issue with our legal counsel, we considered the option of creating our own " eBook License" which would be similar to Creative Commons but which would prohibit even non-commercial printing. Unfortunately, this option would:
  1. require us to establish an entirely new publishing "standard" license;
  2. add legalese and restrictions that supporters and rights holders alike would find unfamiliar and undesirable;
  3. lose the benefit of the universal and clear standards of the Creative Common licenses.  CC licenses, for example, can be recognized and acted on by automated search engines.  Precedents exist for what is allowed. For almost a decade, CC licenses have allowed authors such as Lawrence Lessig and Cory Doctorow to publish successful commercial print and ebook editions alongside open access, CC-licensed ebook editions.
A different strategy would be to use a standard CC ND license, but to add a technical obstacle to printing which does not conflict with open access for the digital version.  If we created something inherently digital (e.g. with revisions that include animations throughout that can’t be printed), then printing a version without animation would violate the non-derivative aspect of the license.

We're not enthusiastic about this option either.  Just as legalese confuses normal people, the subtleties of media technology are likely to confuse lawyers and Judges. If someone wants to object to our interpretation of a “derivative use,” there's no technology that can keep them from suing.

A third option is by far our top choice, and is the one we will pursue. Get the various rights holders to agree among themselves!  Since the CC BY-NC-ND license only allows incidental and not-for- profit printing of ebooks, print publishers willing to let an author unglue an eBook using this license should also be willing to waive any conflict with their “exclusive” print rights.

Authors and publishers have mostly managed to get on with business without a clear legal decision on whether an ebook is a book. The possibility of a crowd-funded payoff shared by print and digital rights holders should create a strong incentive for them to work together to unglue the ebook.

How hard could it be?

  1.  Or at least, you need to have any publisher with “exclusive” print rights waive those rights with respect to any “non-commercial” printing of a CC ebook for personal use.
  2. My colleague Amanda Mecke contributed to this article.
  3. Yes, that's the new logo for the service, coming soon!
  4. Standard IANAL disclaimer.
Enhanced by Zemanta

Saturday, October 29, 2011

The United Nations of Reading

The Internet Archive
I had a great time at Books in Browsers, even though I completely lost my voice on the second day. The assembled talent and brainpower made almost every moment a thrill. When my talk from the morning of the first day was given prominent mention in the New York Times' Bits Blog, I got so excited that I couldn't pay attention to an amazing talk on annotation of medieval manuscripts.

But the most important talk of the two days was Brian O'Leary's closing presentation, which prompted the Twitter backchannel to unanimously elect him the "Secretary-General of the United Nations of Publishing".

Here's his abstract:
Although business models have changed, publishers and their intermediaries continue to try to evolve their market roles in ways that typically follow the rules for “two-party, one-issue” negotiations.  In an environment in which the negotiations are better framed using models for “many parties, many issues”, these more limited approaches have made the design of a flawed ecosystem even worse, shifting burdens onto valued intermediaries (libraries and booksellers, among others).

Content abundance, coupled with improvements in available technologies, gives us an opportunity to reshape the competitive framework.  This talk will examine options to apply the principles of effective game design to create a set of new, targeted and evolving business models for content dissemination in an era of abundance.
O'Leary talked about the changes occurring in the entire ecosystem of what used to be called "publishing": authors, agents, publishers, distributors, retailers, libraries, and of course readers. He noted that relationships throughout the ecosystem were being renegotiated without an awareness of the effects of these changes on the rest of the ecosystem. As a result, frameworks, arrangements and processes that could benefit the entire ecosystem were not being given the consideration they deserve.

The future of EPUB
O'Leary pointed to the discussions leading to the United Nations Convention on the Law of the Sea as a possible inspiration for a reading-ecosystem way forward. The breakthrough in those discussions was the introduction game-theory models that helped the parties see the effects of agreements and provisions on all stakeholders in the Law of the Seas negotiations. If a similar sort of model could be developed for the activities surrounding publishing, it might be possible to do a lot more that to "save publishing". Intelligent, collaborative application of digital technologies should be able to increase the effectiveness of an industry whose purpose is to promote reading, education, culture and knowledge.

According to O'Leary, we need to figure out ways to fund the sort of research that could be the basis of modeling for the reading ecosystem. One possibility would be to create a cross-industry organization to do so.

If such an organization were created, I hope that its membership mirrors the composition of Books in Browsers attendees. Many inhabitants of the reading ecosystem were represented, despite the technology emphasis of the meeting- publishers, librarians, agents, academics, authors, designers. The contrast with last week's DPLA Launch meeting was striking- hardly any publishers or authors were in evidence at DPLA. It seems to me that with everything that's at stake, we could do a lot worse than to listen some more to Brian O'Leary.

Update (10/31/11): The text of O'Leary's talk is posted here.)

Sunday, October 23, 2011

Creative Commons - ND (No Derivatives)

When I was a sophomore in high school, I read Catcher in the Rye. To me, the amazing thing about this book was the language. It seemed like every other word was "bastard", "goddam" or "sonofabitch". What were my teachers thinking?

Imagine if the Salinger estate decided to release a Catcher in the Rye ebook with a Creative Commons License so that 10th graders around the world could read it for free. What sort of license would they choose? In particular, would they choose a "No Derivatives" license?

Here's the "legal code" of the No Derivatives (ND) restriction in the CC BY-NC-ND license:
The [granted] rights include the right to make such modifications as are technically necessary to exercise the rights in other media and formats, but otherwise you have no rights to make Adaptations.

"Adaptation" means a work based upon the Work, or upon the Work and other pre-existing works, such as a translation, adaptation, derivative work, arrangement of music or other alterations of a literary or artistic work, or phonogram or performance and includes cinematographic adaptations or any other form in which the Work may be recast, transformed, or adapted including in any form recognizably derived from the original, except that a work that constitutes a Collection will not be considered an Adaptation for the purpose of this License. For the avoidance of doubt, where the Work is a musical work, performance or phonogram, the synchronization of the Work in timed-relation with a moving image ("synching") will be considered an Adaptation for the purpose of this License.
The advantage of allowing derivative works (Adapations) is that people would be free to use Catcher in the Rye for all sorts of amazing things. There would be a thousand YouTube dramatizations of Catcher in the Rye, free to all. There would be fan fiction. There would be novels about Holden as a homeless person, Holden as a Wall Street tycoon, or as President Caulfield. There would be translations, graphic novels and operettas. Best of all there would be versions of Catcher that would have all the goddams replaced by gosh darns and bitches replaces by guns, and that's what 10th graders would read in Texas. Imagine what they'd read in North Korea: Brother Ho Gathers Rice.

J. D. Salinger is rolling over in his grave even as we ponder the scenario. I think it's safe to say that Catcher in the Rye will not see a license allowing derivatives in my lifetime or in yours. It's not about generosity at all, it's about the artistic vision of the author. And J. D Salinger is not alone in wanting to ensure the integrity of his works. That why Creative Commons offers the "No Derivatives" option for its licenses in the first place.

There are lots of cases in which it's valuable to be able to change a work. As much as it hurts when your edit is reverted, the most amazing feature of Wikipedia is that anybody can change it. For a jazz singer, a song that you can't riff on is not jazz at all. For a teacher, a textbook that you can't adapt to your curriculum is just wrong. In these and many other applications, an ND license seriously reduces the value of a work.

But to date, most books have been written with the expectation that the the version that goes out to the printers is more or less the version that will be read. Authors have not incorporated the possibility of remixing and read-write literature into their creative visions. Certainly this will change as new forms and conventions emerge. But for now, most authors want to control the expression of their creations, even if they're willing to set them free. For the purposes of, we have to respect these wishes if we are to convince authors to release their works into the public commons. Money is not the issue.

As Mike Taylor, a long time friend of this blog, commented on a previous post, the ND aspect of our "standard" license clashes somewhat with the second two bullet points of Creative Commons'  "Share, Remix, Reuse" slogan.  It's important to recognize that even the CC BY-NC-ND license that will use by default unlocks "Remix" and "Reuse" activity that falls under "Fair Use".  The Creative Commons licenses leave untouched the fair use rights of users, and are hostile to Digital Rights Management (DRM) software that in practice impedes these rights. DRM typically blocks many types of fair use, and in the US, the Digital Millenium Copyright Act (DMCA) criminalizes the circumvention of this DRM.

Many of the derivative works that have Salinger spinning are allowed under fair use no matter what the license. But an ND license lets an author keep potentially valuable movie rights and translation rights. The value of these would be enhanced by letting everyone in the world read the book for free through ungluing, and this incentive will benefit the public by reducing the authors' ungluing price.

It's hard to know what sorts of "adaptations" of a work will be possible in the future. However, the Creative Commons licenses, including the ND licenses, make it clear that users have the right to migrate the work to new formats for the purposes of accessibility and compatibility with new media and technology. This is important to all of us, because without this right, it's quite possible that many of the ebooks we use today will be unreadable 50 or a hundred years from now.

  1. As always, don't confuse this blog with legal advice.
  2. According to Wikipedia, Catcher in the Rye continues to sell 250,000 copies a year. 
  3. The Catcher in the Rye is #410 on Amazon's best-seller list. 
  4. A fair "ungluing price" for Catcher in the Rye would be at least $4,000,000.
  5. I've previously posted about the Attribution and Non Commercial attributes of Creative Commons Licenses
  6. It's funny. Don't ever tell anybody anything. If you do, you start missing everybody.
Enhanced by Zemanta

Saturday, October 22, 2011

The DPLA Muster

Battle flags never made sense to me. "Why give your opponents something to shoot at?", was my thinking. As if soldiers with deadly weapon would bother rally to a flimsy piece of cloth. An idea. What's powerful about that?

Friday, I attended the plenary session that launched the Digital Public Library of America (DPLA) at the National Archives in Washington DC. (with 300 others!) When it started early this year, I was pretty skeptical of the DPLA. It had no discernible plan of action, no coherent vision for the future of libraries, no business model, not even an awareness of how impossible its dream really was. All they had was... a battle flag, and a figurative one, at that.

What I saw was the power of a battle flag. John Palfrey and a cabal of Harvard academics have forged a movement from the fire of frustrated librarians, archivists, and information professionals who have recognized that a lot of the present system is broken and going nowhere fast. They sent out a call for help, and amazingly enough, that call was answered.

Although the news of the day centered around $2.5 million grants from Sloan Foundation and the Arcadia Fund, and promises of cooperation from Europeana and the British Library, I was most encouraged by the presentations in the afternoon, from people who actually build stuff.

At Gluejar, we've been struggling with the difficulty of presenting collections of books to Internet users in a meaningful and effective way. The typical UI of a book site is pretty lame. If the site goes beyond bland lists, it may try for a "bookshelf" view. The problem is that a bookshelf can only present 50 books or so, and a decent library will have 100,000 or even a million things to display.

The most elaborated UI experiment latched on to DPLA was ShelfLife from the Harvard Library Innovation Lab. It presents books as an "infinite bookshelf" arranged vertically to best display titles on the spines. It clings to the physicality of books by using thickness to represent the number of pages in the book, and uses the height of the book to show... the height of a book. It sounds stupid, but works a lot better than you might think. Go try it.

Bookworm was another interesting demonstration, from the people who brought you Google NGram Viewer. Using the less-restricted data from OpenLibrary, Bookworm allows you to examine subject heading occurrence as a function of time, and uses this visualization as a way to expose lists of book records. Very cool, but I felt like it was a fun toy for a job that wants a ear-splitting power tool.

The enthusiastic reception for these and other projects, which seemed to come out of the woodwork in response to the DPLA call to action, convinced me that DPLA is much more than a pitch for foundation funding. The library world, and the academic community that relies on libraries, is hungering for innovation and experimentation to show the way out of ebook purgatory.

So go for it, DPLA. There will be content of all sorts from libraries, museums, and the like for you to organize. Internet Archive, Hathi Trust and others will push the boundaries on book digitization and distribution. Gluejar will do its best to stock your shelves with unglued books that people care about. My advice: do some small things well and the big things will follow. That's what battle flags are all about.
Enhanced by Zemanta

Thursday, October 20, 2011

Creative Commons - NC (Non-Commercial)

wormsby Wahj  (CC BY-NC-ND)Real worms don't come in cans. The last time I saw worms offered for sale, they came in paper buckets, the kind that usually hold Chinese take-out. You open these up, and the worms don't jump out at you. Maybe if you left the bucket open for a day or two, the worms would eventually find their way out, but any resulting problem is manageable. So when we say that doing something "opens up a can of worms", the main thing to think about is not about the calamities that will emerge from the bucket, it's whether or not you want to go fishing today.

The "Non-Commercial" attribute of Creative Commons "NC" licenses is definitely a can of worms. "Non-commercial" is subject to varied interpretation, and the license is not entirely successful at removing ambiguities. What uses are commercial? For example, is a blog that attracts advertising revenue allowed to post a CC NC licensed ebook? Is a for-profit distributor of ebooks allowed to distribute an NC ebook? Is a non-profit charity allowed to print paper copies of an NC ebook and sell the copies? Is a for-profit copy shop allowed to charge a school to make copies of an NC textbook? Is a for-profit company allowed to use an NC ebook about widget manufacturing to improve its factory? Is it ok for me to show you this picture of worms? Et cetera.

In building, we hope that readers and institutions will financially support Creative Commons relicensing of books that are important to them. In doing so, we need to make sure that the licenses we choose enable the things that the supporters want to do. We need to balance these uses against the rights and concerns of the authors and other rights holders who would be granting these licenses.

To get a better feel for what's allowed and what isn't under NC, we have to look at the "legal code" of the license. Here's what it says in the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND) License:
You may not exercise any of the rights granted to You [...] in any manner that is primarily intended for or directed toward commercial advantage or private monetary compensation. The exchange of the Work for other copyrighted works by means of digital file-sharing or otherwise shall not be considered to be intended for or directed toward commercial advantage or private monetary compensation, provided there is no payment of any monetary compensation in connection with the exchange of copyrighted works.
Let's examine some of our wriggling worms against this "code". Remember that I'm not a lawyer, and you should not rely on my scribblings for legal advice of any kind.
  1. Go ahead and improve your lucrative widget factory. The rights restricted by the NC clause are the rights to reproduce, distribute and publicly perform the work. The Creative Commons licenses do not restrict other uses of the work. If there are million-dollar ideas in the book, your ability to exploit them for commercial gain is not restricted by a CC license.
  2. If your blog is your business, it's not a good idea to build it on NC licensed photos from flickr, even if you don't charge for access. But if you're a book blogger and you make money with advertising, is posting a free ebook "primarily directed towards commercial advantage"? This worm is jiggling a bit! If you're a potential supporter of a book, this is the sort of use you probably want to support. It's not really clear how to apply the NC clause. Similarly, Apple, Amazon and Google are big companies that make a lot of money in the course of distributing ebooks. Distribution of some ebooks for free gives them indirect commercial advantages. To the extent that the uncertainly in the NC provision prevents seamless distribution of the works to their users, it goes counter to what most book lovers would want.
  3. Even if you're a non-profit, you can't print and sell copies of an NC e-book to raise money for starving orphans with cancer.
The areas of uncertainty includes some use cases that we think are non-commercial uses. To make it clear that we consider the distribution of unglued ebooks for free to be an allowed activity under NC licenses, rights holders who offer works to the public through will agree to the following:
For purposes of interpreting the CC License, Rights Holder agrees that "non-commercial" use shall include, without limitation, distribution by a commercial entity without charge for access to the Work.
We may also require a statement to the same effect in the front matter of the released ebook; we're still working out the file format details.

Given this clarification, why not go all the way, and require that rights holders agree to commercial distribution of works that get unglued?

 It turns out that the alternative to our can of worms harbors some poisonous snakes. Let me introduce you to one of these. Look at the Amazon page for Dance Dance Revolution (Wii Video Game) a 140 page paperback supposedly edited by Lambert M. Surhone, Mariam T. Tennoe, and Susan F. Henssonow. The so-called publisher, "Betascript Publishing" takes Wikipedia articles and turns them into books. So far, so good. Perfectly legal within the scope of Wikipedia's Creative Commons License (BY-SA). But how would you feel if you found a Wikipedia article that you wrote (with minor edits from others) on sale at Amazon for $57.47? When this happened to my brother he was mostly amused at the audacity of it all. But I think that if the same thing happend to a book I had worked on for a year of my life, it would seriously piss me off. If I had contributed money to "give the book to the world" I would be similarly aggravated. You could argue that Betascript is providing a valuable service by providing attractive formatting and improving the discovery of the article, but please don't.

chinese takeout boxby gabrielsaldana  (CC BY-SA)Retention of commercial rights is potentially of significant value to authors, and can reduce their asking price for ungluing books. I've previously written about Cory Doctorow's experience with selling "deluxe bound" versions of his Creative Commons Licensed books. There's also the possibility that authors' prior publishing contracts preclude them from offering commercial Creative Commons licenses (I'll write more about that soon). Since most of the uses we imagine for unglued ebooks, including the uses most important to libraries, are not affected by our use of the NC-flavored licensed, we've decided to open this "can of worms" in hopes of catching more "fish". We'll allow rights holders to offer non-NC licenses, but we won't expect them to do so.

Notes: I posted yesterday on the "Attribution" in Creative Commons Licenses. Here are some links on Betascript, which has over 350,000 "books" listed in some book directories:
Enhanced by Zemanta

Creative Commons - BY (Attribution)

Have you ever wondered whether Anonymous can use an Creative Commons attribution license? The Answer is YES, Attribution licenses ARE useful, even for Anonymous.

In the process of developing the service, we've had to study licenses and decide which ones are best for ungluing ebooks. Since supporters will be putting up real money to relicense the books (making them free to the world), the details of the license need to be spelled out clearly, upfront.

It's a big topic with lots of considerations, so I'm going to write about our choices in three pieces. We'll be using the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND) License for most of the books that we unglue. This post will focus on the easiest choice- the attribution part. Even with attribution, there are some tricky bits.

Here is the text, or "legal code" of the attribution requirement in the CC BY-NC-ND License:
If You Distribute, or Publicly Perform the Work or Collections, You must [...] provide, reasonable to the medium or means You are utilizing:
  1. the name of the Original Author (or pseudonym, if applicable) if supplied, and/or if the Original Author and/or Licensor designate another party or parties (e.g., a sponsor institute, publishing entity, journal) for attribution ("Attribution Parties") in Licensor's copyright notice, terms of service or by other reasonable means, the name of such party or parties;
  2. the title of the Work if supplied;
  3. to the extent reasonably practicable, the URI, if any, that Licensor specifies to be associated with the Work, unless such URI does not refer to the copyright notice or licensing information for the Work.
The credit required by this Section may be implemented in any reasonable manner; provided, however, that in the case of a Collection, at a minimum such credit will appear, if a credit for all contributing authors of Collection appears, then as part of these credits and in a manner at least as prominent as the credits for the other contributing authors. For the avoidance of doubt, You may only use the credit required by this Section for the purpose of attribution in the manner set out above and, by exercising Your rights under this License, You may not implicitly or explicitly assert or imply any connection with, sponsorship or endorsement by the Original Author, Licensor and/or Attribution Parties, as appropriate, of You or Your use of the Work, without the separate, express prior written permission of the Original Author, Licensor and/or Attribution Parties.
In the case of Anonymous, you can't distribute a CC BY licensed work owned by Anonymous unless you provide attribution to anonymous. You can't say that you wrote it, for example.

For Public Domain works, there's no attribution requirement. It would be perfectly legal for me to take Moby Dick, for example, change the title to Moby Duck, attribute it "the Gluejar Collective" (me and Herm), and sell it on Amazon for $100 per copy (if I get an ISBN!). It might not be legal in France, though. Unlike the United States, most European countries, and especially France, have strong protections for authors' "moral rights". In France, even if an author had released work under a non-attribution license, I wouldn't be able to use the work in a way that abused the author's name and reputation.

Non-attribution licenses (i.e. CC0) are particularly useful when many people contribute to a work, as in the case of Wikipedia, and the use of the work would be inhibited if attribution of all the contributors was required. CC0 (technically a waiver, not a license) tries to address possible conflict with laws assuring moral rights.

For, we expect that most creators will insist on the attribution requirement. Come to think of it, most readers and supporters would insist on it as well. You wouldn't want to read Primary Colors by Herman Melville, would you?

Note: I'm an engineer, not a lawyer, so please don't use this article as a substitute for legal advice. If you want to build nuclear weapons with it, or generate superluminal neutrinos based on the information it contains, you have my explicit permission to do so.
Update 10/20/2011: Corrected CC0 info. The "NC" option is discussed in another post.
Enhanced by Zemanta

Sunday, October 16, 2011

How Can We Change the Future? The Tomorrow Project

It turns out that Intel, the giant chip maker, employs a full time futurist. His name is Brian David Johnson, and he actually gets paid to go around asking people what the future might be like. Intel says they're the "Sponsors of Tomorrow", so I guess they want to have a clue about what they're sponsoring. When I worked at Intel in the early 80's, we could have sponsored a thousand futurist studies, and not one of them would have predicted that Intel would someday employ a "Chief Futurist" leading a "Tomorrow Project".

None of those futurists would have predicted that over a hundred thousand people would show up at New York Comic-Con, either. But it's happening. The show is completely sold out. Jacob Javits Convention Center is packed to the gills with zombies, otaku, wood nymphs, transformers and girls with blue, purple or red hair- i don't know the word for them.

Many of them packed a very serious session hosted by Johnson featuring Cory Doctorow, the science fiction writer, blogger, and activist. The session was entitled "Sci-Fi Prototyping: Designing the Future Panel". No on in the audience was disappointed not to hear about the future of the panel, and we also did without Doug Rushkoff, whose appearance was scheduled to make the panel a panel, but who failed to predict his future schedule well enough to participate.

Doctorow, Johnson, Rushkoff, and, who was accurately predicted to not be present, have contributed to The Tomorrow Project Anthology which had its launch today. Nostalgically enough, this is a book. Less nostalgic, but perhaps just as dated, it's a 1.8MB PDF file. Made available for free, by Intel. Doctorow's contribution is a novella by Doctorow called The Knights of the Rainbow Table which so far (I'm on p.17), is a fun read. It's about the nano-apocalypse that will occur in the near future when it's easy for a group of grad student low-lifes to crack everybody's website password security.

Johnson framed the session as a discussion about the ways in which science fiction can provide a narrative to steer the future. I'm a bit skeptical. I don't think that the "narrative" of Star Trek communicators caused Motorola engineers to create the flip-phone, even if they were fans of the show while growing up. Doctorow had a really interesting analogy, though. He said a science fiction story was like a Petri dish that lets an microscopic idea grow into a huge colony of micro-organisms visible to the naked eye. That strikes me as a really useful way to think of how fiction influences the world.

The problem with ascribing power to narrative is evident if you look at the world around us. Narratives compete with other narratives, and their relative power derives not from their truth or their skill, but rather from their fit. Narratives warning about the death of privacy, for example, have scant power compared to the offer of a free movie, or even a free PDF download. No one pays attention to a narrative unless it fits with what they want to do today.

Before the panel, Doctorow expressed to me his strong commitment to making his works available with Creative Commons licenses; He'll certainly release The Knights of the Rainbow Table that way. But let's work on Intel to change the future a bit. Why can't they release The Tomorrow Project Anthology with a similar license? (the current license is all rights reserved, you can download it, but you can't redistribute it) You CAN help change the future- file a request to post the whole ebook using this form.
In yesterday's tomorrow, androids dream of electric sheep, cars fly around LA, and in Blade Runner, people read newspapers on paper. What will today's tomorrow look like tomorrow? I wonder how much of today's best writing about the future will be available to people ten years from now. Unfortunately, the answer depends on licensing details that most creators don't think much about. Doctorow is an exception.
Enhanced by Zemanta