Big Data and its Impact on Academic Publishing
As academic publishing has made the transition from print editions to electronic editions, librarians, facing leaner budgets and an inability to deselect specific low-interest journals from massive packages, have begun to push back with demands for packages of full subscriptions to high-value journals. These demands are combined with pay-per-view download pricing on the more esoteric journals and are able to make specific demands because they are armed with more data about their users than ever before. Tracking utilization of specific articles from specific journals can now be narrowed down to the individual user, enabling reasonably accurate forecasting for future use when the data is aligned with the respective assignments on course syllabi.
Big Data’s Impact
The technological capability to access multiple databases and the vast troves of information they contain appears to have been widely accepted as a tremendous step forward in the future of academic research. Searches can be measured in seconds rather than hours, and as long as your keyword matrix is solid, you can be overwhelmed with data in mere minutes. However, this foundational premise of “more must be better” appears to avoid many of the practical challenges of actually putting all of this data to use. For data to be mined, it must first be tagged or categorized so that it will link to the relevant search criteria when that algorithm scans through the database in order to send back an appropriate list of results to your request.
Adapting to Big Data
Frank Pasquale’s book “The Black Box Society: The Secret Algorithms that Control Money and Information,” calls attention to the fact that Big Data has created a world in which we are using search algorithms that we no longer understand to generate data that is used to make mission critical decisions involving hundreds of millions of dollars and, in the case of medical science, involving human lives. To put this into perspective for academic research, this is about stepping into a driverless car and trusting the computers on board to get us to our destination safely—a possibility that is actually being implemented by various technology firms.
If we are lax in our comprehension of the changes in this technology, how do we prepare new graduate students and faculty for the appropriate use of this data? How does a research librarian offer guidance to a student researcher on the use of a search algorithm if he or she isn’t fully trained to understand the impact of these algorithms. Even worse, if there are different algorithms for different databases, where will the resources come from to learn them all to a basic level of competence. The greater concern here is that the easier option will be to outsource the search function to the providers of the database, at which point a dependency is created that will be hard to challenge. Although publishers and journals have effectively implemented mechanisms to track usage of articles and publications, tracking the impact and sharing the knowledge gained through Big Data is becoming increasingly important at the university level.