Posts Tagged 'machine learning'

What happens to libraries and librarians when machines can read all the books?

Revised text of talk I gave for the Harvard Library Leadership in a Digital Age program.

The description of this course promises that “you will identify fundamental changes occurring in the field of knowledge management and consider their implications for libraries, information services, and library leadership.”

I think my session maybe breaks the rules a bit (which is my first leadership tip for you: when it feels like the right thing to do, break the damn rules!).

One of the things I think is important for library leaders is that we look at fundamental changes outside of knowledge management and consider their implications for libraries and the work we do.

I think looking outside of changes in our own field is essential if we want to be active, effective leaders who don’t merely respond to change, but who create and shape the change we believe is needed in libraries and archives.

So, I want to talk about AI and libraries in at least 2 ways:

  1. Substantively, I want to share with you some of my thoughts and speculations about the potential implications of AI and machine learning for libraries and librarianship .
  2. I also want to talk a bit about AI on a more meta-level – that is to say, I want to use my own commitment to learning about and thinking about AI and its implications for libraries as an example of the more general tension leaders face between tending to immediate, local challenges and thinking about, preparing for and creating the future.

So let’s start with why I’m interested in machine learning and AI.

Basically, it is because I think that it is past time for us to take digital libraries to the next level; and I think the next level is likely to involve machine learning and optimizing our collections, services, and spaces for machine learning applications.

Where are we in digital libraries right now? We are still in the midst of the initial digital shift in libraries (really from the mid-to-late 1990s to now).

In this shift, we have gone from libraries being a place where individuals came to find physical books and journal articles (and manuscripts, and images, and lots of other stuff) so that they could read those books and articles themselves, to libraries being a service that individuals use to gain online access to journal articles, and e-books, and digital images and manuscripts and more so that they can read and use those things on their own digital devices.

This ongoing digital evolution of libraries and of how students and faculty use scholarly content is significant and has arguably made research and teaching more efficient and more productive.  The advent of online and digital libraries has also made more information more accessible to more people than ever could have been possible when scholarly materials were available only in tangible, physical formats.

But if this switch, from individuals reading books and articles one at a time in print to individuals reading books and articles one at a time on their own digital device is all we get from the digital revolution, then it won’t have been much of a revolution.

In the title of the talk, I ask “what happens to libraries & librarians when machines can read all the books?” But the truth is that we are already there – or at least the machines are. So it behooves us to be ready for it – intellectually, strategically, and operationally.

I think an important part of leadership is not just responding to changes, but actually getting in front of those changes when we can.

Let’s start with some definitions.

From the MIT Press Essential Knowledge book Machine Learning:

AI is “Programming computers to do things, which, if done by humans, would be said to require “intelligence”.

Machine learning is a kind of AI, where the computer is programmed to optimize a performance criteria using examples or past experience. The machine does what the data tell it to, not what a program tells it to.

In describing the advent of machine learning, Ethem Alpaydin says:

“nowadays, more and more, we see computer programs that learn – that is software that can adapt their behavior automatically to better match the requirements of the task. We now have programs that learn to recognize people from their faces, understand speech, drive a car, or recommend which movie to watch … once it used to be that the programmer who defined what the computer had to do … now, for some tasks, we do not write programs but collect data”

Since I’m not a computer scientist or an engineer, I use the terms in relatively loose ways and often interchangeably.

When I think of AI and machine learning in the context of libraries, I think of computer programs and algorithms that can extract/derive meaning and patterns from data, make predictions and inferences about and with new data, and in doing so, solve problems at scales not possible by humans only

Slide05At an MIT symposium a few years ago Elon Musk, CEO of Tesla, talked about the existential threat of AI and suggested a need for regulatory oversight. Specifically, he said “With artificial intelligence, we are summoning the demon.”

So let’s talk about the fears and concerns, maybe they aren’t as existential as Musk’s (I find librarians tend to be more practical), but I’m sure we have some. I certainly do.A

What are the dangers of AI, especially in relation to libraries and the things we support — especially research & learning? Here are some common concerns:

    • Robots will take our jobs – In an article in Library Journal in April 2016, Steven Bell writes about the Promise and Peril of AI for Academic Librarians – and he asks “Could artificially intelligent machines eliminate library jobs?
    • One reason people argue that AI will not replace library or other jobs is that machines can’t replace the deeply human skills of creativity and interaction; which may mean that those skills become more valuable or could mean that AI will usher in an era where creativity and empathy are devalued and rare
    • Another fear is that AI will eliminate the relationships between people and books, and between librarians and their community members
    • And one concern I think is very important to take seriously is the reality that without explicit counter-measures, machine learning & AI will re-inscribe & magnify existing systems of inequality and racism, sexism, homophobia and the like.

Here’s a cautionary tale about that last concern:

Last Spring, Microsoft unveiled a twitter bot named Tay; programmed to tweet like a teen. What could go wrong, right?

Tay is backed by Artificial Intelligence algorithms that were supposed to help the bot learn how to converse naturally on twitter. But what happened is that the bot learned quickly from the worst racist sexist corners of twitter – and within 24 hours Microsoft had to shut the experiment down because the bot had started tweeting all kinds of sexist, racist, homophobic, anti-Semitic garbage.

Even, or especially, with those concerns in mind, I think we need to think about AI and machine learning and the implications for libraries.

My thinking about AI, machine learning, & libraries, is guided by 3 kinds of questions:

  1. What role can libraries play in making sure we don’t summon the demon; or at least that we have the tools to control or tame the demon?
  1. How might we leverage AI in support of our missions? How might AI help us do some of our work better?
  2. How might we support AI and machine learning in ways that are consistent with and natural evolutions of the long-standing missions and functions of libraries as sources of information and the tools, resources, expertise to use that information?

Let me address the 1st issue and offer some thoughts on libraries as demon-slayers in our AI future. First, we need to accept that AI and machine learning are becoming more prevalent in our daily lives, and in many learning and research contexts.

Then we have to think about what concerns around AI that libraries and librarians are maybe especially well-suited to addressing; like privacy, context, authority, and ensuring the data used to train AI is inclusive and diverse and of high quality.

This last one seems to be to be especially urgent – as an example, when Apple hired a new Director of AI research, he spoke about the promise of AI as a research tool, imagining — “If I ask you something about a particular thing, can your system basically go to Wikipedia, read a few different articles, learn some facts about the world, and provide you with the right answer?” As much as we all love and use Wikipedia, I suspect that makes some of us cringe. Wouldn’t it be better to have “your system” go to the actual scholarly literature on a topic?

The 2nd area we should think about is how we can leverage AI in our work?

A typical area we think about is reference – this is Steven Bell’s concern that AI chat bots will replace reference librarians.

There is also plenty of potential around using machine learning in search – the 2 articles that were assigned reading for this session cover that ground fairly well (see list of references at end of this post).

We might also imagine leveraging AI for recommendation systems, and for cataloging and organizing our collections.

What if we turned my original question around and asked what would we do if librarians we could read all the books?

Slide07

If we really could absorb all the information in our collections and make some sense of it, what would we do? What could we do if we had the capacity to read all our books, and maybe all the books in our peer libraries, and derive patterns from them?

What would we do that we can’t do now? What would we do better that we already do?

Can thinking about AI and machine learning in that way help us conceive of ways to leverage the fact that machines actually can do that now?

Finally let’s talk about how machine learning and AI might change or be changing research; and how we might start to think about optimizing our libraries to support new kinds of research made possible by text & data mining, AI and machine learning.

Let me share 2 really interesting examples:

Prof Regina Barzalay and her students and colleagues at MIT are using machine learning to extract information and predictions from the unstructured data in tens of thousands of pathology reports. Faster, as accurate as humans; and based on much larger amount of data than humans have access to.

Another example I learned about from my colleague Sara Lester, Engineering Librarian at Stanford, is GeoDeep Dive is a tool for geologists that uses machine learning to extract data that is buried in the text, tables, and figures of journal articles and web sites, sometimes called dark data, about rock formations.

GeoDeepDive is based on open source code, that can be repurposed on other datasources. Should libraries be exploring how tools like this could help us extract even more meaning and information from deep within our collections?

I think it is important not just that we know about these kinds of efforts, but that we proactively ask where can AI and machine learning be leveraged in the service of better science?

And how do libraries leverage our resources and skills to ensure it really works – and is infused with and informed by values we care about (inclusion, privacy, democracy, social justice, authority, etc.)?

Where can we intervene to make sure the research based on AI and machine learning is as good as it can be?

We help students find the best books and articles for their learning; so can we help programmers find the best data for their algorithms to learn on?

Can we help them think about the questions they want their machine learning applications to answer? Can we help fit the data to the question?

A final string of thoughts, provocations, and questions that keep me up at night:

As I begin to fully appreciate the fact that machines really can read all the books, and can “learn” from them; I am convinced that we need to think more rigorously about reading.

What are the different ways of reading, and what are the various goals of reading?

What can we learn best, as individuals and/or as society, through human reading? what can we learn best through machine reading?

Can we start thinking about how to design libraries to maximize the unique payoffs of many different kinds of reading?

How can texts (and images, and data) be maximized for human discovery and reading? For discovery via algorithms and reading by machine learning applications?

What does it mean to maximize our collections for humans and what does it mean to maximize them for machines and algorithms?

OK – really wrapping it up now:

Machines can already read all the books. Or at least they can read all the books (or articles) that they can read.

(sidebar about how the proliferation of AI should compel us to double-down on mass digitization and on open access)

Trying to understand a little bit about AI and machine learning has taken me way outside my cognitive comfort zone, but I think it is the kind of thinking we need to do to be effective library leaders and to be effective stewards of the future of libraries, librarianship, and for those of us in research libraries, for the future of scholarship.

I think it will be crucial that we avoid the temptation to continue to serve primarily individual human readers and let the computer scientists worry about how to apply machine learning and AI to vast libraries of resources.

I think we would be wise to start thinking now about machines and algorithms as a new kind of patron  — a patron that doesn’t replace human patrons, but has some different needs and might require a different set of skills and a different way of thinking about how our resources could be used.

Slide11

For further reading:

Early reading list on machine learning

In the preliminary report from the MIT Task Force on the Future of Libraries, we make several references to the importance of optimizing library content, data, and metadata for machine learning applications.

We imagine a repository of knowledge and data that can be exploited and analyzed by humans, machines, and algorithms. This transformation will accelerate the accumulation and validation of knowledge, and will enable the creation of new knowledge and of solutions to the world’s great challenges. Libraries will no longer be geared primarily to direct readers but instead to content contributors, community curators, text-mining programs, machine-learning algorithms, and visualization tools.

I am convinced that machine learning is going to have a major impact on the advancement of knowledge in lots of ways we can’t anticipate, and I want to understand it better. I am also convinced that without the intervention of folks who understand the biases built into our collections in terms of content, organization, and description; machine learning applications will re-inscribe and reify existing inequalities.

To that end, I’m trying to put together a reading list to get smarter about what machine learning is, what it can do for libraries, and what libraries can do to support and inspire creative, productive, just and inclusive applications of machine learning. Here’s my very incomplete initial list. Additional suggestions welcome in the comments.


Enter your email address to follow Feral Librarian by email.

Join 9,654 other followers

Follow me on Twitter


%d bloggers like this: