PubliHM – A Research Information System
DTLab Challenge with FORWIN
Problem
The publication rate of a university is a central piece of information on the extent and quality of its scientific work. It is the basis, among other things, for the acquisition of third-party funding, university rankings or for the promotion right and thus has a central strategic importance. In addition, the overall picture of university performance is regularly surveyed by the responsible Ministry of Science and Research.
The problem addressed in this project was that the research information of the different disciplines and researchers was previously available only in different, separate systems. This made it difficult for the preparers of the publication performance report to produce reports quickly, easily, and completely. Instead, they had to be compiled through time-consuming, manual work. Since these report requests often came at short notice, the experts in the relevant departments who were responsible for compiling the reports had to deal with an additional and unplannable burden.
Procedure
The staff unit Center for Research Promotion and Young Scientists (FORWIN) is entrusted with the regular preparation of reports on the publication performance of HM. It had already developed a concept for a system that would ensure central access to HM's complete publication data. To ensure that it was written to meet all customer needs and requirements, the concept was translated into a press release ("Press Release") following Amazon's "Working Backwards" innovation process. Then it was examined which data sources would be eligible to feed the data.
The investigation revealed that a large part of the common publisher offerings, as well as portals such as ResearchGate or Google Scholar, were not suitable because the providers do not allow data retrieval or only in a limited way. Against this background, the freely available ORCID data, i.e. alphanumeric codes for the unique identification of authors and contributors to scientific communication, were determined to be the main data source. The design of PubliHM was based on various AWS services. The basis is to be a data lake, i.e. a central repository, which is to be filled with information on publications that can be assigned to HM.
Innovation in action
With the help of AWS, the following concept was created for PubliHM:
In the first step, the AWS Step Function acts as a control component to activate an Amazon Lambda function at defined points in time. The latter is used to execute the code with manageable runtime without having to commission an additional system for this purpose. The resulting data is then stored within AWS S3 buckets. The AWS Glue service reads the data stored in S3 buckets and merges them within virtual tables, if the use is advantageous for the aforementioned expansion stage of the project. In addition, access control is handled by this component. Subsequently, the analysis and demand-oriented presentation of the corresponding data sets take place.
Next steps
In the coming semester, a new group of students will work on refining and automating the reporting in PubliHM. PubliHM will then enter the pilot phase and be tested by the first users.
Challenge giver: HM, FORWIN
Professor: Prof. Rainer Schmidt
Semester: Winter semester 2020/21
Supporting documents
- Student research project