advanced applied progarmming (Python)
CSCI-398: Advanced Applied ProgrammingA2: Image Search SaaS Use key-value stores and Ajax web interfaces to search images in DBpedia, an extracted version of WIkipedia: have a quick look.This assignment has two milestones:A2M1: Use Python to create command-line interface (CLI) to load and query the DBpedia/Wikipedia data. A2M2: Use Flask, Bootstrap, and React to build a web app (backend and frontend) to query DBpedia and display the search results.1. Command-line interface (A2M1)1.1. Getting started1.2. Wikipedia data1.3. Loader1.4. Querier1.5. Submission checklist2. Web app (A2M2)2.1. Backend2.2. Frontend2.3. Final testing2.4 Submission checklist2.5 Notes3. Extra Credit1. Command-line interface (A2M1)Createa command-line interface (CLI) to load and query DBpedia/Wikipedia. TheCLI uses two key-value stores (KVS): images and terms. The terms KVSmaps the Wikipedia keywords to Wikipedia articles, and the images KVSmaps the articles to images. This assignment aims to create a map fromkeywords to images. For example, the keyword cloud might give uspictures of clouds.1.1. Getting startedInstall Python (https://www.python.org/downloads/or use brew on OS X)In Git Bash, set the PATH to Python: export PATH="$PATH:/c/Python34Get the code from GitHub: https://goo.gl/TG8hEoClone the a2 repo in the shell (using instructions similar to A0) Change to the udc-csci-398-a2 directoryConfirm Git Bash sees Python: python --versionInstall the dependencies in the shell: pip install -r requirementsDownload the DBpedia filesimages_en.nt.bz2: http://downloads.dbpedia.org/2014/en/images_en.nt.bz2labels_en.nt.bz2: http://downloads.dbpedia.org/2014/en/labels_en.nt.bz2Decompress both files (for e.g., in a terminal, run bunzip2 images_en.nt.bz2)Move the decompressed file to the data/ directory1.2. Wikipedia dataTheimages_en.nt file associates Wikipedia categories with images, whereasthe labels_en.nt file associates Wikipedia categories with labels. Forinstance, the category "Cloud" might be associated with an image of acloud, as well as the label "cloud" (and perhaps other labels). Incombination, we can use these files to search for images using searchterms (i.e., text). Bothfiles consist of triples <A> <B> <C> that describevarious aspects of Wikipedia categories, not just images and labels. Youcan think of these triples as an association between <A> and<C> where <B> describes the type of association. The filescontain several kinds of triples, but, for the purposes of thisassignment, only two types are relevant: the ones in images_en.nt where Bis http://xmlns.com/foaf/0.1/depiction (in this case, A is the categoryand C is an image URL), and the ones in labels_en.nt where B ishttp://www.w3.org/2000/01/rdf-schema#label (in this case, A is thecategory and C is the label). You can use the less command to have alook at the files, but you do not need to write code for reading thesefiles directly; we have provided some code for you.1.3. LoaderYourfirst task is to write a loader: loader.py. You are given modules toput together for the loader, all in the a2m1/ directory.kvs.py: implements key-value storage systems: disk, mem, and cloud.kvs_test.py: run kvs test cases.parser.py: parse data files.parser_test.py: run parsertest cases.loader.py: loads them into the key-value stores.loader_test.py: run tests on loader.test.py: run all tests.Theloader parses data files, creates related key-value stores, and thenloads the stores. It loads the image store with subject/object pairsrepresenting Wikipedia categories and related image URLs and loads theterm store with labels and Wikipedia categories. We have provided codefor parsing the data file and creating the key-value stores; your firstimplementation task is loading the stores. However, there are somerestrictions on how you should do this:loader.load_images():You should first index the images from images_en.nt. The parser onlyreturns images where the relationship ishttp://xmlns.com/foaf/0.1/depiction: representing an image depicting aspecific Wikipedia topic, as opposed to other related topics. All imagesshould be stored in a key-value store called images (the kvsparameter), where the key is the Wikipedia category and the value is theimage URL (the key-value pairs generated from the image_iterator): see sample code.loader.load_terms():You should next create aninverted index from labels_en.nt. Here theidea is to index, for each label, the Wikipedia category or categoriesthat correspond to that label. All labels should be stored in akey-value store called terms (the kvs parameter) where the key is aword from the label and the value is the Wikipedia category. You shouldonly add an entry to this store if the category exists in the imageskey-value store, i.e., if we have an image. If a label contains multiplewords, you should create separate entries for each word: see sample code.Tobe able to answer queries with approximate matches, you should a)regularize the case for the words in the title, and you should b) use astemming algorithm to remove suffixes before storing the item. We areincluding the helper class Stemmer, which stems words. You will needto learn how to use it from looking at the function definition for stem.Example: The images KVS may contain the following entries.key: http://dbpedia.org/resource/American_National_Standards_Institutevalue: http://upload.wikimedia.org/wikipedia/commons/8/8f/ANSI_logo.GIFThe terms KVS may contain the following entries.key: americanvalue: http://dbpedia.org/resource/American_National_Standards_Institute key: natevalue: http://dbpedia.org/resource/American_National_Standards_Institute key: standardvalue: http://dbpedia.org/resource/American_National_Standards_Institute key: institutvalue: http://dbpedia.org/resource/American_National_Standards_InstituteNotethat the label ("American National Standards Institute") has beenbroken into separate words, and that case regularization and stemminghave been applied.1.4. QuerierNow,you will write a querier module, querier.py, that reads keywords fromthe command line. You are given a querier module in the a2m1/ directorywith a stub querier.query() function.querier.query():The querier module opens both the terms and images key-value stores.For each keyword, it should retrieve the matching Wikipedia categoryfrom the terms store, then retrieve all URL matches for those categoriesfrom the images key-value store. You should apply the same transformations (stemming, case regularization, ...) as in the loader:see sample code. Thequerier prints each keyword, then the list of matches, to the console.The provided main method already contains code to do this; please do notchange the output format in any way, since this will break our gradingtools.1.5. Submission checklist You implemented the loader.load_images() and loader.load_terms() functions without changing the function definitions.You implemented the querier.query() function without changing the function definition.You ran the loader tests and passed all test cases: python loader_test.py You ran the querier test and passed all test cases: python querier_test.pyYou printed your full name and UDC username to the console when the the loader and querier are invoked.Your code contains a reasonable amount of useful documentation (required for style points).You have completed all the fields in the README.md file.You have checked your final code into the git repository.You submitted a .zip fileYou included in your .zip file your solution (all .py files)You included in your .zip the README.md file, with all fields completedYour .zip file is smaller than 100kB: excluding the large data files (such as images_en.nt or labels_en.nt).You submitted your solution as a .zip archive to Blackboardbeforethe A2M1 deadline on the course calendar. (If you choose to use jokers,each joker will extend this deadline by 24 hours.) 2. Web app (A2M2)You will write a Flask backend and React frontend using code is in the a2m2/ subfolder from git. 2.1. BackendYouwill write the backend app, backend.py, that implements a REST API .You are given Python code that uses Flask, a Python web framework. Flaskimplements our search REST API: /api/search. A basic backend app in Flask follows.# backend.pyfromflask importFlaskapp =Flask(__name__)@app.route("/api/hello")defhello(): return"Hello World!"if__name__ =="__main__": app.run()Inaddition to implementing our API, Flask also serves static files(backend.send_static) such as index.html. Run the development server inthe shell:$ python hello.py*Runningon http://localhost:5000/Test your hello API endpoint in your web browser: go to http://localhost:5000/api/hello Test data: Weneed data to test our backend. Run the loader.py in A1M1 to generatetest data on disk: python loader.py -d --filter="Azh". You are givenbackend code that assumes you created two Shelf objects (disk-basedkey-value stores) in the a2m1 directory: images_kvs =Shelf('a2m1/'+IMAGES_KVS_NAME)terms_kvs =Shelf('a2m1/'+TERMS_KVS_NAME)Using the Shelf objects, you should implement the backend:backend.search():Youshould first return the search results in JSON format. Using thea2m1.query() function, write code to return a list of matches, formatthe results as a dictionary, convert the dictionary to JSON, and return aJSON response to the user: see sample code. backend.name():Yourshould then create a new API endpoint: api/name. This endpoint returnsyour full name and UDC username in JSON: {"name": "Shakir James(shakirjames)"}To run the backend, run the Flask development server in the shell: $ cd udc-csci-398-a2$ python -m a2m2.backendNote: Whenyou make changes to the backend.py, you must reload the developmentserver: press CTRL+C to quit, and re-run the backend as above.2.2. FrontendNext,you will finish the frontend app, frontend/index.js. You are givenJavaScript code that implements a single-page web app in React. OurReact app expects our backend search API to return the data model:{"searchInformation":{ "totalResults":100, }, "items":[ {link:"http://en.wikipedia.org/wiki/Special:FilePath/AzharUsman.jpg"}, {link:"http://en.wikipedia.org/wiki/Special:FilePath/Holland_2004.jpg"} ]};Our designer gave us an HTML mock (see frontend/mock.html): Thinking in React, we decomposed the frontend app into components[1]:Our frontend app has five components:ImageSearchContainer (orange): app container SearchBar (blue): accepts user inputThumbnailGrid (green): show grid of image thumbnails filtered based on user inputThumbnailRow (brown): display a row of imagesThumbnailImage (red): display an imageThe components form a hierarchy: ImageSearchContainerSearchBarThumbnailGridThumbnailRowThumbnailImageOur React app stores static data in props.Taking a top-down view of the app, the ImageSearchContainer componenttakes the data model as a prop and its subcomponents render the propsdata: flowing data down the hierarchy.The app stores dynamic data in state. The state dataadds interactivity: dynamically changing the data model. The state inour app consists of the search text because the images can be computedbased on the search text. The searchText state lives in aImageSearchContainer component and its subcomponents use callbackfunctions to alter the searchText: flowing data up the hierarchy viaexplicit callbacks.Wehave written most of the frontend code. However, you should write a newReact component to display your author name: replacing Shakir James(shakirjames).Author:Yourtask is to use your /api/name API endpoint to display your full nameand username in the frontend app. You should create an Author Reactcomponent, get data from the /api/name endpoint, and render thecomponent with the API data: see sample code.2.3. Final testingCreate DynamoDB tables: First,well need to create tables in DynamoDB. Navigate tohttps://console.aws.amazon.com/dynamodb. Click the Create Table button:enter images as the table name, enter kvs_key as the primary key(hash key) and keep string as the type. For table settings, uncheck "usedefault setting" and increase both the write and write capacity unitsfrom 5 to 20. Click Create. Repeat this process for a table calledterms.Add your AWS credentials:Towork with DynamoDB, you should add your AWS credentials to your boto3configuration file. We already installed boto3 (from our requirementsfile). Now, setup your AWS credentials for boto3 in ~/.aws/credentials[2]: [default] aws_access_key_id = YOUR_KEY aws_secret_access_key = YOUR_SECRETLoad data to DynamoDB: Tobe able to easily switch your code between Shelf and DynamoDB, the M1CLI accepts a kvs option. Run the loader on the shell to upload data toDynamoDB: python loader.py -d --kvs=cloud --filter="Az"Thetotal Wikpedia dataset will result in about 1.5GB worth of data, whichwill take a long time to create (and, worse, a lot of Amazon cycles,which will reduce your credits), so you should set filter to Ar to indexonly topics that start with 'Ar', which should in a manageable databasesize. For testing, you may want to work with even smaller data sets,e.g., just the first 100 topics.Aftersome time, your data should be in DynamoDB you can confirm this bychecking in the dynamoDB console, clicking on the table, and thenclicking on the items tab.Modify backend.py:Thebackend code currently uses Shelf objects, disk-based key-value stores.You should change the backend to use DynamoDB: comment out the linesthat instantiate the Shelf objects and uncomment the lines that use theDynamoDB objects. Re-run your backend: Now,rerun backend.py and navigate back to the URL in the browser. Yourimage search application should work with any search term that wasindexed (if your filter was Ar then any search term starting with theseletters should produce image results).2.4 Submission checklist You implemented the backend.name() API endpoint.You implement the frontend Name component. Your code contains a reasonable amount of useful documentation (required for style points).You have completed all the fields in the README file.You have checked your final code into the git repository.You are submitting a .zip fileYour .zip file contains all the files needed to run your solution (including all .js)Your .zip file contains the README file, with all fields completedYour .zip file is smaller than 100kB. You submitted your solution as a .zip archive to Blackboardbefore the M2 deadline on the first page of this assignment. (If you choose to use jokers, each joker will extend this deadline by 24 hours.)2.5 NotesPleasekeep in mind that Amazon charges for machine utilization, datatransfer, and data storage. Enrolling in Amazon's AWS Educate programshould give you sufficient credit to complete this assignment (as wellas the remaining assignments). Nevertheless, you should carefullymonitor your credit level to make sure that you do not run out, and youshould release any resources when you are no longer using them. Forinstance, after completing this assignment, you should delete the datayou uploaded to DynamoDB.3. Extra Credit We will offer the following extra credit items in this assignment:M1:Submit an additional implementation of the Querier in JavaScript [+10%]M2: Dont include images that are dead links. [+10%]M2:Infinite scrolling - show all images on the app. However, dont loadall at once just load eight. When the user scrolls down and reachesthe bottom of the page, you should load and display eight more similarto how the Facebook news feed works. [+10%]Thesepoints will only be awarded if the main portions of the assignment workcorrectly. Any extra credit solutions should be submitted with therelevant milestone, and they should be described in the README.[1]"Thinking in React - Facebook Code." 2014. 26 Sep. 2016 <https://facebook.github.io/react/docs/thinking-in-react.html>[2]"GitHub - boto/boto3: AWS SDK for Python." 2013. 28 Sep. 2016 <https://github.com/boto/boto3>Published by Google DriveReport AbuseUpdated automatically every 5 minutes
THIS QUESTION IS UNSOLVED!
Request a custom answer for this question