Revised text of talk I gave for the Harvard Library Leadership in a Digital Age program.
The description of this course promises that “you will identify fundamental changes occurring in the field of knowledge management and consider their implications for libraries, information services, and library leadership.”
I think my session maybe breaks the rules a bit (which is my first leadership tip for you: when it feels like the right thing to do, break the damn rules!).
One of the things I think is important for library leaders is that we look at fundamental changes outside of knowledge management and consider their implications for libraries and the work we do.
I think looking outside of changes in our own field is essential if we want to be active, effective leaders who don’t merely respond to change, but who create and shape the change we believe is needed in libraries and archives.
So, I want to talk about AI and libraries in at least 2 ways:
- Substantively, I want to share with you some of my thoughts and speculations about the potential implications of AI and machine learning for libraries and librarianship .
- I also want to talk a bit about AI on a more meta-level – that is to say, I want to use my own commitment to learning about and thinking about AI and its implications for libraries as an example of the more general tension leaders face between tending to immediate, local challenges and thinking about, preparing for and creating the future.
So let’s start with why I’m interested in machine learning and AI.
Basically, it is because I think that it is past time for us to take digital libraries to the next level; and I think the next level is likely to involve machine learning and optimizing our collections, services, and spaces for machine learning applications.
Where are we in digital libraries right now? We are still in the midst of the initial digital shift in libraries (really from the mid-to-late 1990s to now).
In this shift, we have gone from libraries being a place where individuals came to find physical books and journal articles (and manuscripts, and images, and lots of other stuff) so that they could read those books and articles themselves, to libraries being a service that individuals use to gain online access to journal articles, and e-books, and digital images and manuscripts and more so that they can read and use those things on their own digital devices.
This ongoing digital evolution of libraries and of how students and faculty use scholarly content is significant and has arguably made research and teaching more efficient and more productive. The advent of online and digital libraries has also made more information more accessible to more people than ever could have been possible when scholarly materials were available only in tangible, physical formats.
But if this switch, from individuals reading books and articles one at a time in print to individuals reading books and articles one at a time on their own digital device is all we get from the digital revolution, then it won’t have been much of a revolution.
In the title of the talk, I ask “what happens to libraries & librarians when machines can read all the books?” But the truth is that we are already there – or at least the machines are. So it behooves us to be ready for it – intellectually, strategically, and operationally.
I think an important part of leadership is not just responding to changes, but actually getting in front of those changes when we can.
Let’s start with some definitions.
From the MIT Press Essential Knowledge book Machine Learning:
AI is “Programming computers to do things, which, if done by humans, would be said to require “intelligence”.
Machine learning is a kind of AI, where the computer is programmed to optimize a performance criteria using examples or past experience. The machine does what the data tell it to, not what a program tells it to.
In describing the advent of machine learning, Ethem Alpaydin says:
“nowadays, more and more, we see computer programs that learn – that is software that can adapt their behavior automatically to better match the requirements of the task. We now have programs that learn to recognize people from their faces, understand speech, drive a car, or recommend which movie to watch … once it used to be that the programmer who defined what the computer had to do … now, for some tasks, we do not write programs but collect data”
Since I’m not a computer scientist or an engineer, I use the terms in relatively loose ways and often interchangeably.
When I think of AI and machine learning in the context of libraries, I think of computer programs and algorithms that can extract/derive meaning and patterns from data, make predictions and inferences about and with new data, and in doing so, solve problems at scales not possible by humans only
At an MIT symposium a few years ago Elon Musk, CEO of Tesla, talked about the existential threat of AI and suggested a need for regulatory oversight. Specifically, he said “With artificial intelligence, we are summoning the demon.”
So let’s talk about the fears and concerns, maybe they aren’t as existential as Musk’s (I find librarians tend to be more practical), but I’m sure we have some. I certainly do.A
What are the dangers of AI, especially in relation to libraries and the things we support — especially research & learning? Here are some common concerns:
- Robots will take our jobs – In an article in Library Journal in April 2016, Steven Bell writes about the Promise and Peril of AI for Academic Librarians – and he asks “Could artificially intelligent machines eliminate library jobs?
- One reason people argue that AI will not replace library or other jobs is that machines can’t replace the deeply human skills of creativity and interaction; which may mean that those skills become more valuable or could mean that AI will usher in an era where creativity and empathy are devalued and rare
- Another fear is that AI will eliminate the relationships between people and books, and between librarians and their community members
- And one concern I think is very important to take seriously is the reality that without explicit counter-measures, machine learning & AI will re-inscribe & magnify existing systems of inequality and racism, sexism, homophobia and the like.
Here’s a cautionary tale about that last concern:
Last Spring, Microsoft unveiled a twitter bot named Tay; programmed to tweet like a teen. What could go wrong, right?
Tay is backed by Artificial Intelligence algorithms that were supposed to help the bot learn how to converse naturally on twitter. But what happened is that the bot learned quickly from the worst racist sexist corners of twitter – and within 24 hours Microsoft had to shut the experiment down because the bot had started tweeting all kinds of sexist, racist, homophobic, anti-Semitic garbage.
Even, or especially, with those concerns in mind, I think we need to think about AI and machine learning and the implications for libraries.
My thinking about AI, machine learning, & libraries, is guided by 3 kinds of questions:
- What role can libraries play in making sure we don’t summon the demon; or at least that we have the tools to control or tame the demon?
- How might we leverage AI in support of our missions? How might AI help us do some of our work better?
- How might we support AI and machine learning in ways that are consistent with and natural evolutions of the long-standing missions and functions of libraries as sources of information and the tools, resources, expertise to use that information?
Let me address the 1st issue and offer some thoughts on libraries as demon-slayers in our AI future. First, we need to accept that AI and machine learning are becoming more prevalent in our daily lives, and in many learning and research contexts.
Then we have to think about what concerns around AI that libraries and librarians are maybe especially well-suited to addressing; like privacy, context, authority, and ensuring the data used to train AI is inclusive and diverse and of high quality.
This last one seems to be to be especially urgent – as an example, when Apple hired a new Director of AI research, he spoke about the promise of AI as a research tool, imagining — “If I ask you something about a particular thing, can your system basically go to Wikipedia, read a few different articles, learn some facts about the world, and provide you with the right answer?” As much as we all love and use Wikipedia, I suspect that makes some of us cringe. Wouldn’t it be better to have “your system” go to the actual scholarly literature on a topic?
The 2nd area we should think about is how we can leverage AI in our work?
A typical area we think about is reference – this is Steven Bell’s concern that AI chat bots will replace reference librarians.
There is also plenty of potential around using machine learning in search – the 2 articles that were assigned reading for this session cover that ground fairly well (see list of references at end of this post).
We might also imagine leveraging AI for recommendation systems, and for cataloging and organizing our collections.
What if we turned my original question around and asked what would we do if librarians we could read all the books?
If we really could absorb all the information in our collections and make some sense of it, what would we do? What could we do if we had the capacity to read all our books, and maybe all the books in our peer libraries, and derive patterns from them?
What would we do that we can’t do now? What would we do better that we already do?
Can thinking about AI and machine learning in that way help us conceive of ways to leverage the fact that machines actually can do that now?
Finally let’s talk about how machine learning and AI might change or be changing research; and how we might start to think about optimizing our libraries to support new kinds of research made possible by text & data mining, AI and machine learning.
Let me share 2 really interesting examples:
Prof Regina Barzalay and her students and colleagues at MIT are using machine learning to extract information and predictions from the unstructured data in tens of thousands of pathology reports. Faster, as accurate as humans; and based on much larger amount of data than humans have access to.
Another example I learned about from my colleague Sara Lester, Engineering Librarian at Stanford, is GeoDeep Dive is a tool for geologists that uses machine learning to extract data that is buried in the text, tables, and figures of journal articles and web sites, sometimes called dark data, about rock formations.
GeoDeepDive is based on open source code, that can be repurposed on other datasources. Should libraries be exploring how tools like this could help us extract even more meaning and information from deep within our collections?
I think it is important not just that we know about these kinds of efforts, but that we proactively ask where can AI and machine learning be leveraged in the service of better science?
And how do libraries leverage our resources and skills to ensure it really works – and is infused with and informed by values we care about (inclusion, privacy, democracy, social justice, authority, etc.)?
Where can we intervene to make sure the research based on AI and machine learning is as good as it can be?
We help students find the best books and articles for their learning; so can we help programmers find the best data for their algorithms to learn on?
Can we help them think about the questions they want their machine learning applications to answer? Can we help fit the data to the question?
A final string of thoughts, provocations, and questions that keep me up at night:
As I begin to fully appreciate the fact that machines really can read all the books, and can “learn” from them; I am convinced that we need to think more rigorously about reading.
What are the different ways of reading, and what are the various goals of reading?
What can we learn best, as individuals and/or as society, through human reading? what can we learn best through machine reading?
Can we start thinking about how to design libraries to maximize the unique payoffs of many different kinds of reading?
How can texts (and images, and data) be maximized for human discovery and reading? For discovery via algorithms and reading by machine learning applications?
What does it mean to maximize our collections for humans and what does it mean to maximize them for machines and algorithms?
OK – really wrapping it up now:
Machines can already read all the books. Or at least they can read all the books (or articles) that they can read.
(sidebar about how the proliferation of AI should compel us to double-down on mass digitization and on open access)
Trying to understand a little bit about AI and machine learning has taken me way outside my cognitive comfort zone, but I think it is the kind of thinking we need to do to be effective library leaders and to be effective stewards of the future of libraries, librarianship, and for those of us in research libraries, for the future of scholarship.
I think it will be crucial that we avoid the temptation to continue to serve primarily individual human readers and let the computer scientists worry about how to apply machine learning and AI to vast libraries of resources.
I think we would be wise to start thinking now about machines and algorithms as a new kind of patron — a patron that doesn’t replace human patrons, but has some different needs and might require a different set of skills and a different way of thinking about how our resources could be used.
For further reading:
- “New AI-Based Search Engines are a “Game Changer” for Science Research”, by Nicola Jones, Nature magazine on November 12, 2016
- Searching for Lost Knowledge in the Age of Intelligent Machines, by Adrienne Lafrance, The Atlantic. (audio version also available)
- Machine Learning: The new AI by Ethem Alpaydin (part of the Essential Knowledge Series from MIT Press)
- And more
On the weekend I was reading “the singularity is near” by Ray Kurzweil, that takes a look at the wider implications of technological acceleration. If by 2023 personal computers will have the same power as the human brain, and by 2045 the power of all human brains combined, and if these technologies are more and more tightly coupled with us (think desktop–>laptop–>palm–>eye–>direct brain interface is predicted), and we have cognitive access to that level of power, as well as much more direct access to each other, I’m really not sure where that leaves libraries. I’m not sure where it leaves anything (hence the term “technological singularity”; an event horizon past which it’s very hard to predict what life will be like). I work in libraries, but have started to see their current form as a stepping stone for society; libraries, like everything else, will see dramatic change in the next couple of decades. I have some trepidation, but am also excited, to see what comes next.
LikeLike
In terms of “Wouldn’t it be better to have “your system” go to the actual scholarly literature on a topic?” – Here’s the rub: the “actual scholarly literature” is huge, un/disorganized, and unranked. The reason to use Wikipedia (for all its flaws) is that it is curated information. Actually, human-curated information.
I see two potential uses for AI *prior* to answering questions. The first is to create topic maps of the development of information in recorded resources. This should result in hubs around key resources and a kind of ranking based on their recurrence in the literature (e.g. using citations). The top resources resulting from such literature “surveys” may be small enough that they could be reviewed by experts in the field and confirmed or adjusted. Questions would be answered by “reading” in ranked order within clusters. Also, clusters would reveal different schools of thought, and answers should reflect that there is not just one view of the topic.
The other use would be to eliminate/tag redundant sources of information. This is a big problem for scholars – they need to know when to stop accessing additional sources that do not provide any new information. The “publish or perish” results in many articles (primarily) that do not add to the knowledge of the field. Those should be flagged.
Basically, this is “subject bibliography by AI”.
LikeLike
It is truly a great and useful piece of info. I’m glad that you shared this useful info with us.
Please stay us up to date like this. Thanks for sharing.
LikeLike
This seems a little like what I’ve been reading for years … it is easy to extrapolate from a few very specific examples to a wider context, but extrapolation isn’t necessarily reality
LikeLike
Excellent. I saw some talks on AI/robots at Computers in Libraries but none were as cogent as this.
LikeLike