What is Data Aggregation?
What is Data Aggregation?
As the volume of published academic research material continues to grow, the prospects for new researchers looking to navigate through all of that data in order to compile a literature review or research proposal become evermore daunting. Whether or not data aggregation will provide the solution remains to be seen.
The collation, curation, and presentation of data in summary form have been a recognized practice in the commercial world for decades now. Experian, one of the providers of your credit score, has managed to amass data points on individual consumers that run into the thousands. Beyond the provision of a simple three-digit score, that data can now be mined for some very targeted marketing campaigns.
Aggregation of data only serves one-half of a prospective user’s needs. Having so much information available in one large database has the potential to save a considerable amount of time from having to work with multiple individual databases. However, that time can only be saved if the collated data can be searched or mined to find the information you are looking for quickly and accurately.
Garbage in, Garbage out (GIGO)
The old maxim about the information you get out of a computer being only as good as the information put into it can be seen to apply to these big data repositories. Restricted-access databases such as Academic Search Premier can be expensive if your library doesn’t have access, but they represent your best hope for the latest research. JSTOR (short for Journal Storage) offers limited free access with other subscription options for current journal publications.
Open Access Databases
Open Access Publishing is committed to the free availability of research data as a stand against the alleged elitism of research journals with expensive subscription rates. Databases such as the Public Library of Science (PLoS) and Stanford’s HighWire have grown dramatically as a result of this campaign. Highwire, for example, lists over 2.5 million free full-text articles and a total article count of almost eight million, but these exemplars set a standard that many other open access databases do not achieve:
- The Social Science Research Network (SSRN) offers total article counts in the hundreds of thousands, but many of them are pre-publication “working papers” that may require further follow-up with the original authors.
- PubMed from the National Institutes of Health (NIH) contains over 24 million articles, but many of them have restricted access.
- Google Scholar leverages the power of the ‘Mother Google’ search algorithm, but many of the search results will include restricted articles and journals that have been flagged as being questionable for their lack of a rigorous peer review process.
A Mixed Blessing
Data aggregation performs a valuable service in casting a wide net to compile relevant data into one big database. However, in these days of open access journals that charge article processing fees instead of journal subscription fees, the quality of the research material that gets captured in that net has become increasingly unpredictable. Sophisticated search algorithms may help you identify relevant material by topic and date, but some of those results may be restricted rather than full access, and some of them may be of questionable authorship. Proceed cautiously!