Libraries in a computational age

(After a year long hiatus from external speaking engagements, I accepted an invitation to speak in Madrid at an event celebrating the 20th anniversary of the Madrono Consorcio. Below is the text of my talk.)

It is a privilege to be able to speak with you and to share with you my thoughts on the future of libraries, and some of what we are doing at MIT to reimagine what a research library can and should be and do in a computational age.

I am particularly happy to be talking to you on the 20^th anniversary of this consortium, which is committed to the same kind of sharing and collaboration across libraries and universities that we will need to do even more of now and into the future.

I think that the best ways research libraries can meet the challenges of the future, and support our universities in educating students and producing research that will allow us to face the future and solve some of the big global problems that are looming is by working together and sharing our experiences.

So let me follow my own advice and share my experience at MIT and tell you a bit about our context.

MIT is probably best known as one of the world’s leading technology and engineering schools. We are ranked #1 in the world, but paradoxically only #3 in the US. MIT has just over 1000 tenure-track faculty, nearly 5000 undergrads (almost ½ women), and 7000 graduate students. Faculty and students are spread out over 5 schools – Architecture and Planning; Engineering; Humanities, Arts and Social Sciences; Management; and Science.

Probably more important than the facts and numbers, MIT is known culturally for at least 3 things: a hands-on approach to learning, openness, and a relentless pursuit of innovation.

The hands-on approach (learning by doing) is reflected in the MIT motto – mens et manus; mind and hand. There is a very real emphasis at MIT on learning by doing – reflected both in the project-based teaching approach across the curriculum and by fact that 90% of MIT undergrads participate in a research project before they graduate.

Openness is a also a very important part of our culture and widely-shared value at MIT. We are one of the few private universities in the US with an open campus, including libraries that are open to all visitors. We are also committed to openly sharing our educational and research materials with the world.

MIT created Open Courseware in 2000, “a simple but bold idea that MIT should publish all of our course materials online and make them widely available to everyone.” To date Open Courseware has over 2 million visitors/month, and hosts 2400 courses.

In 2009, MIT passed one of the first campus-wide open access policies in the US, passed by a unanimous vote of the faculty. MIT turned to the libraries to implement the policy, and because of a commitment to provide adequate staffing and resources to collecting faculty research, we now share 45% of MIT faculty journal articles written since 2009 openly with the world through our OA repository.

The 3^rd important component of MIT culture is that it is a place obsessed with innovation; across the curriculum. MIT is where gravitation waves were detected, and where Guitar Hero was invented.

MIT’s most recent innovation was to reinvent itself and how we think about computing across the curriculum and in every discipline. This year, MIT launched a new college of computing – a $1 billion effort that will eventually add 50 new faculty to MIT. The goal of the new college is to address the global opportunities and challenges presented by the prevalence of computing and the rise of artificial intelligence (AI) by infusing computational thinking throughout every department and discipline AND to ensure computer science, and especially machine learning and AI are informed by work in other disciplines – especially social sciences and humanities.

I give you all this background about MIT because to my mind MIT is a perfect place to develop a bold and ambitious vision for the future of research libraries. I came to MIT in 2015, after many years working at the Stanford libraries, and I could see right away that this was a place that was ready to think about libraries as more than books and buildings.

We do have books and buildings of course. We operate 5 libraries on campus, a reading room for distinctive collections, and a storage annex. We have a collection of 2.3 million print items. We get over ½ million visitors to the library spaces/year (that figure is growing slightly in recent years. 87,000 of our 2.3 million physical items were checked out last year, and like most academic libraries, the circulating of print is declining.

Use of our vast digital collections is significantly higher and growing. There were over 80 million searches on our online databases last year, and 2.3 million downloads of the open access articles we disseminate via DSpace@MIT.

While I know that the size of our library at MIT may seem large to many of you, compared to the US research universities that are widely considered our peers (Harvard, Yale, Princeton, and Stanford), MIT has a small library – at least by the traditional measures of size of print collection, or budget, or staff. Harvard has a physical collection 10 times ours (22 million) and a staff of 700, compared to our 160. Yale, Princeton, and Stanford all have collections over 10 million and staffs of over 300. (Note: I am well aware that size is relative, and that by almost any measure, MIT and the MIT libraries are extremely well resourced. I add the HYPS comparison because I find many folks assume MIT Libraries are roughly the same size as those peers.)

We are small(ish) but mighty, and we are mighty in ways that are relevant to current and future research and that align with MIT’s core missions: open scholarship, and computational and algorithmic access to collections.

Our vision is to be an open, interactive and computational library.

Let me back up a bit now, and tell you how we arrived at that vision, and hopefully convince you that a focus on openness, interaction, and machine access to collections is a good direction for the future of research libraries more generally.

Shortly after I arrived at MIT, Provost asked me to lead convo across campus about what the future of libraries should be. So I convinced him to create a task force on the Future of Libraries. I volunteered to co-chair the task force, and we ended up with 30 members; mostly faculty from all across MIT – engineers, computer scientists, business school faculty, historians, biologists, etc. Importantly, the task force also included staff from libraries, the MIT Press and central information technology.

Membership ranged from folks who relied heavily on library collections and librarians in their teaching and research, to faculty who claimed at the start that they “never used the library”.

I asked this group to think about the future of libraries as a kind of research question. I wasn’t interested in how well the current library was serving their needs, and how we might improve or expand that a little bit. I was asking them to think critically about what a research library can and should be in a digital age, and now a computational or algorithmic age.

The report from the group is online, but want to share highlights here.

The first conclusion was that although the initial digital turn in libraries was not yet complete, we were already on the cusp of a second, potentially more profound one. The first, original digital shift in libraries was print to digital plus print, and was brought about by the internet, google, and e-books/journals.

In that first digital turn, the library went from being a place where individuals came to find physical books and journal articles (and manuscripts, and images, and lots of other stuff) so that they could read those books and articles themselves, to libraries being a service that individuals use to gain online access to journal articles, and e-books, and digital images and manuscripts and more so that they can read and use those things on their own digital devices.

Slide with text “print … to digital” and image of person taking book off full bookshelves, and image of a tablet device showing a digital bookshelf

Although this was a HUGE shift, it did not open up access to scholarly content the way many of us hoped it would. In large part because of the market power of many large commercial publishers, the advent of online journals did not democratize access to knowledge, and the potential for the rise of the internet and of online information and scholarship to create information equality has been stunted. None the less, the first digital turn in libraries and scholarly communication did make research and reading arguably more efficient for those who had access.

In describing the next evolution of libraries, the MIT future of libraries task force emphasized not only the technological shift, but also the importance of combining this shift with a renewed commitment to open science and open scholarship. What is the next shift? It is an evolution of libraries from service to platform, and is from not just digital and physical; but also to computational.

The Future of Libraries task force described this by calling for the libraries to operate as an open global platform. A platform is something scholars and patrons build on, and it is a way of thinking about libraries as not just physical and digital repositories of content; but as vehicles for interacting with content and tools and expertise to both consume information and to create new knowledge in many forms – text, images, data, maps, multi-media, interactive and dynamic. And for a library as platform to be truly effective for current and future patrons, it must be committed to openness and to serving a global community.

“The MIT Libraries must operate as an open, trusted, durable, interdisciplinary, interoperable content platform that provides a foundation for the entire life cycle of information for collaborative global research and education.”

Future of Libraries TF, 2016

One of the key features of the library of the future – the library in a computational age — is that it should be a library accessible by machines and algorithms, not just by people. In a computational age, we have to realize that humans are not our only patrons. In fact, I have argued before that we would be wise to start thinking now about machines and algorithms as a new kind of patron — a patron that doesn’t replace human patrons, but has some different needs and might require a different set of skills and a different way of thinking about how our resources could be used.

Drawing of a robot, holding a book, with thought bubble of “I love reading”

When I think of AI and machine learning in the context of libraries, I think of computer programs and algorithms that can extract and derive meaning and patterns from data, make predictions and inferences about and with new data, and in doing so, solve problems at scales not possible by humans only.

I said earlier that the MIT Libraries vision is to be an open, interactive and computational library. Let me explain those a bit:

Open is about more than open access – it is about being a library that is open and inclusive of a range of ideas, and types of knowledge.
Interactive means that we collaborate with scholars as partners, because in a computational world our understanding of how information is organized, how data is managed, and where and how bias creeps in becomes even more important than ever
Computational is about ensuring our collections are accessible in formats optimized for text mining and other computer analyses; and that patrons can design and code their own ways to access and analyze our collections.

The computational part of this vision is what I think is really interesting, and where libraries can play a unique and important role in a this age of AI. There are several ways to think about the roles of research libraries in a computational age:

We all know that there are problems with bias in algorithms, and especially in terms of the data used to train algorithms. When the data used in computational research is not diverse, not inclusive, and/or is described in ways that reflect societal prejudices and inequalities then those problems and biases will be reflected and amplified in the findings, conclusions and applications of that research. Librarians know better than most people how information is collected, assembled and organized; so we know where things can go wrong. Library folks who understand data and metadata, can help ensure scholars are aware of the shortcomings of the data they use, and can help mitigate those impacts.

We can also use machine learning in our own work. One of the most interesting applications of machine learning is in assigning subjects to books. MIT Press is using a machine learning learning tool called Unearth to ‘read’ all of the books it publishes and extract subjects that human readers might miss.

We can also do what libraries have always done and be a centralized, accessible, and inclusive resource for our communities. We can provide centralized access to computational tools and resources as a way to equalize access to machine learning across our campuses. Some libraries might do this by providing AI labs in the library, with access to hardware, software and training tools to get students starting in using machine learning and AI. We might also maintain online libraries of training data and basic algorithms that students can use and modify as they learn.

But IMO, the most important thing that libraries can do is work to ensure that the knowledge and research products we already collect, curate, and disseminate are openly available and that the scholarly record is as diverse and as inclusive as possible. Because it is the combination of truly open access to lots and lots of content – text, data, code, images – analyzed with powerful computational tools and methods where really interesting things can happen. New findings and understandings and new discoveries in sciences and humanities have thus far mostly occurred when we build on prior knowledge, and make new and creative connections between facts, data, knowledge and insights. Computational access to open collections of knowledge means that can happen at a speed and a scale that most of us can’t even really imagine. Certainly, the choice of topics and problems, the interpretation and application of results requires human imagination – but machine learning tools can speed up the process and, when combined with open access, equalize the ability of people to make use of the knowledge we have already accumulated.

The connection between open science and computational analysis is why this topic is so exciting and important to me. One of the recommendations of the MIT task force on the future of libraries, was that MIT convene another task force – this one on open access and open science. I am co-chairing that task force now and we released draft recommendations in March 2019, and expect to release final recommendations and a report this fall.

As our task force has engaged faculty and students across MIT in the cause of open science, we have emphasized that open access to research is good for research and is critical to enhancing our ability to collectively solve big global challenges. That is a compelling and true argument, but the argument that seems to resonate most strongly at MIT is when we explain to scholars that locking their work behind publisher paywalls means that their research will likely be left out of the scholarly conversation and the progress of science; because those conversations and that progress is increasingly computational. Researchers who are looking for data to use for machine learning and computational analysis are looking for data that is easily and openly available. Publishers want to sell not just reading access but also computational access to scholarly content; but I believe the integrity of science depends on educational institutions maintaining control over their own scholarly output – disseminating and preserving it in institutionally owned and operated repositories.

Imagine the progress we could make as a society if the output of researchers was openly and computationally available in interoperable repository platforms operated by and for the academy? (Here is a good place to put in a hearty personal endorsement of the Invest in Open Infrastructure initiative.)

It is possible that the most important thing libraries can do in a computational age is to continue to fight for open science and open scholarship – based on academic values and served via academy-owned infrastructure.

The combination of open + computational + academy-owned is the future that I think libraries are uniquely well suited to pursue, and that I think is what our universities and our communities need us to pursue.

Feral Librarian