The Tech | Notion

Website

Public facing website where users can browse the catalog of newspapers.

Catalog

Non-public facing and limited access for even internal use. It’s also going to be a web application that allows us upload a picture/scan of newspaper(s) and the system extracts the images and text, stores them in a database and also provides editor/admin users an opportunity to tag each newspaper. For example, if we upload a picture of a newspaper with its content largely about kidnapping and armed robbery, the editors should use an insecurity/associated tag.

Content Extractor

This is easily the most complex of the systems, as it requires high quality images of newspapers to be taken and a yet-to-be-determined extraction system. Options for the extraction includes Google Vision, Amazon Rekognition and Microsoft. Or something custom built.

Hardware

Server costs(potential)

NOTE: The Content Extractor server(s) will not be on 24/7. They might run say 1 week in a month. Specifically, they are only used when extracting content from new papers. The cost for 1 month is $1491. The value quoted in the table is for 1 week.