This has moved Electronic Arts to change tact with one of its bigger franchises. Given the success it has had with a number of freemium games on mobile devices, with The Simpson's Tapped Out grossing big console game numbers, the new Command & Conquer is a free-to-play title. You download the client and get several levels for free - how much is yet to be determined - then if you like it you can enhance the experience with paid-for content.
Download the code here. Unzip and untar it:$ tar xfz a6.tgzWithin the moogle directory, you will find a Makefile and a set of.ml files that make up the project sources. You will also find threedirectories of web pages that you can use for testing: simple-html,html, and wiki, as described above.Below is a brief description of the contents of each file: .merlin: to configure merlin for you, as in previous assignments Makefile: used to build the project -- type "make all" at the command line to build the project. order.ml: definitions for an order datatype used to compare values. myset.ml: an interface and simple implementation of a set abstract datatype. dict.ml: an interface and simple implementation of a dictionary abstract datatype. query.ml: a datatype for Moogle queries and a function for evaluating a query given a web index. util.ml: includes an interface and the implementation of crawler services needed to build the web index. This includes definitions of link and page datatypes, a function for fetching a page given a link, and the values of the command line arguments (e.g., the initial link, the number of pages to search, and the server port.) moogle.ml: the main code for the Moogle server. crawl.ml: Includes a stub for the crawler that you will need to complete. graph.ml: definitions for a graph abstract data type, including a graph signature, a node signature, and a functor for building graphs from a node module. nodescore.ml: definitions for node scores maps, which is part of the page-rank algorithm. pagerank.ml: Code for the page-rank algorithm including an indegree algorithm for computing page ranks. testing.ml: The crawler tests and a driver for dictionary and set tests.
Compile Moogle via command line by typing make all.To start Moogle up from a terminal or shell, type: $ ./moogle.byte 8080 42 simple-html/index.htmlThe first command line argument (8080) represents the port thatyour moogle server listens to. Unless you know what you are doing, youshould generally leave it as 8080, though you may need to try a different port(e.g., 8081, 8082, etc.) to find a free one.The second command line argument (42) represents the number of pages toindex. Moogle will index less than or equal to that number of pages.The last command line argument (simple-html/index.html) indicates thepage from which your crawler should start. Moogle will only index pagesthat are on your local file system (inside the simple-html,html, or wiki directories.)You should see that the server starts and then prints some debugginginformation ending with the lines: Starting Moogle on port 8080. Press Ctrl-c to terminate Moogle.Now try to connect to Moogle with your web-browser -- Chrome (here) seemsrather reliable (we have experienced glitches with some versionsof Firefox and Safari). Connect to the following URL: :8080
Your first major task is to implement the web crawler and build the searchindex. In particular, you need to replace the dummy crawlfunction in the file crawl.ml with a function whichbuilds a WordDict.dict (dictionary from words to sets oflinks.)You will find the definitions in the CrawlerServices module(in the file util.ml) useful. For instance, you will notice that the initial URL provided on the command line can be foundin the variable CrawlerServices.initial_link. This will be theonly link in the crawlers frontier when crawl is intially called. You should usethe function CrawlerServices.get_page to fetch apage given a link. A page contains the URL for the page, a list oflinks that occur on that page, and a list of words that occur on thatpage. You need to update your WordDict.dict so that it mapseach word on the page to a set that includes the page's URL. Thenyou need to continue crawling the other links on the page recursively.Of course, you need to figure out how to avoid an infinite loop whenone page links to another and vice versa, and for efficiency, youshould only visit a page at most once. The variable CrawlerServices.num_pages_to_search (which is called n in the crawl function) contains the command-line argument specifying how many unique pages you should crawl. So you'll want to stop crawling after you've seen that number of pages, or you run out of links to process.The module WordDict provides operationsfor building and manipulating dictionaries mapping words (strings)to sets of links. Note that it isdefined in crawl.ml by calling a functor and passing itan argument where keys are defined to be strings, and values aredefined to be LinkSet.set. The interface forWordDict can be found in dict.ml.The module LinkSet is defined in pagerank.mland like WordDict, it is built using a set functor, wherethe element type is specified to be a link. The interface for LinkSet can be found in themyset.ml file.Running the crawler in the top-level interpreter won't really be possible, soto test and debug your crawler, you will want to compile via command lineusing make, and add thorough testing code. One starting point isdebugging code that prints out the status of your program's execution: see the OCaml documentation for the Printf.printf functions for how to do this, and note that all of our abstract types (e.g., sets, dictionaries, etc.) provide operations for converting values to strings for easy printing.We have provided you with three sample sets of web pages to test your crawler.Once you are confident your crawler works, run it on the small html directory:./moogle.d.byte 8080 7 simple-html/index.htmlsimple-html contains 7 very small html files that you can inspectyourself, and you should compare that against the output of your crawler.If you attempt to run your crawler on the larger sets of pages, you maynotice that your crawler takes a very long time to build up your index.The dummy list implementations of dictionaries and sets do not scale very well.Note: Unless you do a tiny bit of extra work, your index will be case sensitive. This is fine, but may be confusing when testing, so keep it in mind.
NOTE: both students in a pair will receive the same grade on theassignment. Following from this, both students in a pair must use thesame number of automatically waived late days. This means that a pairhas available the mininum number of free late days remaining betweenthe two of them. (A partner with more free late days available wouldretain the excess for future individual assignments or assignmentspaired with another student with free late days remaining.)
This paper describes a software program for cluster analysis that can knead the strengths of these two seemingly different approaches and develop a framework of parallel implementation for clustering techniques. For most model based approaches to clustering, the following limitations are well recognized in the literature: 1) the number of clusters has to be specified; 2) the mixing densities have to be specified, and as estimating the parameters of the mixture models is often computationally very expensive, we are often forced to limit our choices to simple distributions such as Gaussian; 3) computational speed is inadequate especially in high dimensions and this coupled with the complexity of the proposed model often limits the use of model-based techniques either theoretically or computationally; 4) it is not straightforward to extend model-based clustering to uncover heterogeneity at multiple resolutions, similar to the one offered by to the model free linkage based hierarchical clustering.
Influential work towards resolving the first three issues has been carried out in  - . Many previous approaches have focused on model selection of mixtures by choosing the number of components, merging existing components or by determining the covariance structure of the mixture density under consideration, see  -  . They work efficiently if the underlying distribution is chosen correctly, but none of these model based approaches is designed to handle a completely arbitrary underlying distribution (see Figure 5 for one such example). That is, we think that limitations due to issues (3) and (4), above often necessitate the use of model- free techniques.
which are chosen using the spectral degrees of freedom criterion introduced in  . Though we started with 10 different smoothing levels, the final clustering shows only 6 different levels along with a decreasing number of hierarchical cluster.
Alternatively, the user can specify the hierarchical level or the number of desired clusters, and obtain the corresponding cluster membership (hard clustering) of the data. For example, the plot in Figure 7 can be obtained by either of the following two commands: 2b1af7f3a8