Gnutella I arguably represents the simplest and most generic type of unstructured Peer-to-Peer networks. This type is characterised by full distribution of the network resources and it is highly fault tolerant in the presence of network churn. Its main disadvantage though lies in its search process which is based on the concept of flooding. Its neighbourhood formation mechanism also significantly increases the network traffic as it is based on the simple but inefficient process of frequently exchanging ping pong messages. Different methodologies have been tried in an attempt to improve Gnutella s overall efficiency but very few have looked into the issue of doing this in the context of Web Services discovery.
As discussed earlier in the module, P2P networks can be used as a platform for deploying distributed Web Services discovery mechanisms as opposed to the standard centralised method of using UDDI. The Gnutella I (version 0.4) protocol is a candidate approach for achieving this although it does have a number of potential limitations.
The aim of this project is threefold: Firstly, to provide a critical overview of Gnutella I in the context of Web Services discovery. Secondly to measure the performance of Gnutella I by developing a set of simplified simulations and thirdly to propose methods for potentially improving the performance as measured in the previous step.
The project consists of 3 tasks you must perform. Two of these are theoretical and one is practical. Their specification is as follows:
Task 1 25% of project effort
Provide a brief overview of the Gnutella I protocol principles and functionality as described in the provided specification document. Identify and explain TWO advantages and TWO disadvantages in using Gnutella I for Web Services discovery.
Task 2 50% of project effort
Use the programming language of your choice to develop a simple Gnutella I simulator. Your
simulator should incorporate only the Query and QueryHit messages (so you don t have to implement the Ping Pong and Push Gnutella descriptors). Furthermore, you should assume that any node that has services that match an incoming query will NOT forward this query to its neighbours even if the remaining TTL >0.
On the module s Blackboard site you will find: a) the text file Network.txt that encodes the actual network you should simulate and b) the text file Resources.txt that shows the distribution of Web Services on the network s nodes. If you study the resource file carefully you will notice that there are 10 different types of Web Services; each one being represented by a number from 1 to 10 respectively. The services with ID between 1 and 6 inclusive are fairly evenly spread in the network. Services with ID 7 and 8 inclusive are very popular and therefore the vast majority of the network nodes have a Web Service of that type. Services with ID 9-10 inclusive are rare and only few nodes have services of this type. Your simulator should use the data from these files in order to setup the project s required Gnutella topology. You should then use your simulator to carry out the following experiment:
Generate and process sequentially 10000 random queries each having a TTL=3. A query can be represented as a pair (originator node ID, target Web Services name). A randomly generated query has BOTH of its fields generated randomly. For each query your simulator should keep the following information:
1. Whether the query found at least one relevant service (query success)
2. Total number of relevant services that the query found divided by the total number of relevant services in the whole network (recall)
3. Number of messages generated (network traffic)
At the end of your experiment your simulator should save the above values into a suitably formatted text file which you can then import to Microsoft Excel in order to draw the following graphs/bar charts:
1. Average query success for each service type
2. Average recall (number of relevant services found divided by total number of relevant services) for each type of service
3. Average network traffic for each type of service
Use your knowledge of the Gnutella protocol specification and the underlying project network you setup to explain and justify the above graphs.
Finally repeat the above experiment, graphs and analysis by using TTL=4. How and why have the results changed compared to those of TTL=3?
Task 3 25% of project effort
The Gnutella simulator you built and used for task 2 is quite simplistic as it does not include the implementation of the ping pong message exchange that takes place in a real Gnutella network. As mentioned in the introduction, the ping pong process actually introduces very significant data traffic on top of the query-related traffic.
Your task is to come up and describe a revised mechanism that can achieve the ping pong message exchange objectives but with less network traffic (number of messages). You should describe your suggested method in some detail and provide logical arguments as to why it could perform better than the ping pong exchange. However, you are NOT expected to carry out any experiments or simulations to prove your claims.
You will submit a written project report in the form of a research paper. The aim of this deliverable is to assess: a) the level to which you managed to meet the objectives of your project, b) the methodology you followed and the strategies you used, and c) the conclusions you drew from your work. The report MUST follow the IEEE formatting guidelines available in the project s Blackboard folder. All reports must be submitted through the Blackboard by the due date. The report MUST include an appendix with the source code you developed for your simulator. The structure and efficiency of your simulator code will NOT be assessed.
Length: 2000 words
5. Report marking criteria
The project report you will produce carries 70% of the module s marks. The aim of your report is to provide a scientific and professional exposition of the work you carried out in the three tasks set out above. The report should be approximately 2000 words in length excluding any references. It is to be formatted strictly according to the guidelines described in the text file Formatting.doc, available from the project s Blackboard folder.
The grade for task 1 will show how well you managed to encapsulate the Gnutella I protocol principles and functionality in your overview. A second criterion will be whether you have correctly identified and justified two advantages and two disadvantages in using Gnutella I for Web Services discovery
For task 2, the key criteria are as follows: Did you develop and describe in the report a Gnutella simulator that works as expected? Did you produce all the required graphs and justified the results?
As you would expect, most of the marks will be awarded on the merit of the correctness and justification of the results. However, neither of these is possible without building a working simulator in the first place.
For task 3, did you suggest a reasonable mechanism that can improve the ping pong process in Gnutella? Did you describe this mechanism clearly and in sufficient detail? Also, did you argue as to why you think it would improve the original Gnutella mechanism?
Finally, with respect to the report structure and presentation, did you follow the formatting guidelines precisely? Is your report well structured with an introduction, middle and end (conclusions)? Have you properly referenced all the material you used?
WARNING: You are free to use any books or electronic resources for your report. However the report should be written in your own words. Copying sentences, paragraphs, sections word by word or diagrams and pictures from documents which belong to other people including your classmates without giving due credits is called plagiarism. Every document deemed to contain copied material will incur severe penalties in marking. It can be given a total mark of 0. To avoid this you must reference all your sources and write the ideas you found in other documents in your own words. If in doubt as to what constitutes plagiarism please come and see me immediately.