Spring 2017 Project Work

1 Project for Spring 2017: Batch Rename PDF files.

We extract the following:
1. The subject classification. Last class example: "Software Comprehension"
2. The year of publication.
3. List of author names (First Middle Last)
4. Name of conference or journal of publication
5. This sound like a wicked problem, but is not.
6. Some of this info may be missing in the pdf.
Rename a file using the extracted elements
1. Format control of the resulting string?
2. See an example: https://www.id3renamer.com/moreinfo

Overview of PDF tools etc.
cf. http://labs.crossref.org/pdfextract/. Download and begin to experiment. Binary distribution. Not source code.
Get GitHub accounts. Free. All source code and documents are expected to be on GitHub.
https://github.com/CeON/CERMINE "is a Java library and a web service (cermine.ceon.pl) for extracting metadata and content from PDF files containing academic publications. CERMINE is written in Java at Centre for Open Science at Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw."
http://jats.nlm.nih.gov/archiving/tag-library/1.1/ Journal Archiving and Interchange Tag Library NISO JATS Version 1.1 (ANSI/NISO Z39.96-2015) December 2015

Immediate Goal: Develop Requirements, and document as expected in the Projects page. This is team work. Due date: by next Mon class?
Important subsystem #1: Develop a "virtual machine" description suitable for our project. The product of this effort is an API. E.g.,
```
fi = openPDF(pathName); y = extractYear(fi); s =  extractSubject(fi); closePDF(fi);
```
1. Suggestion: Each of you gather, say 5, academic papers. Study the range of where the info that we wish to extract lies.
Important subsystem #2: Do a mock up of a GUI that is an aid to using the "Rename PDF Files" tool.

This list is incomplete.

Topic: The subject/topic classification. Ex: "buffer overflow exploit" "software engineering" "president obama's lectures"
Year: The year of publication. Ex: "2016"
Authors: One-author-name consists of [First Middle* Last], list of author names is comma separated. Ex: Umme Ayda Mannan, Iftekhar Ahmed, Rana Abdullah M Almurshed, Danny Dig, Carlos Jensen. This example has 5 authors.
JournalConf: Name of journal or conference. Ex: "Journal of ACM" "USENIX Symposium on Networked Systems Design and Implementation"
Pages. Page numbers of the PDF. Ex: "32 - 54"
Kind: PhD-thesis/ MS-thesis/ Tech-Report/ Manual/ Paper-Regular/ Slides/ Book/ Proceedings/ Unknown.
Location: City and Country of Publication. Ex: "London, UK" "New York, NY, USA"
Volume: Usually a number. Some times a name. Ex: "25"
Issue: Usually a number. Some times a name. Ex: "3" "March"
Publisher/ Organization. Ex: "Springer Verlag" "IEEE"