Fall 2008 Announcements
Read FAQs and announcements before you call me or send me email. You will not get my response to the questions to which the answers have already been posted.
Please, do not send your email to Prof. Kenneth Sohn of ECE Dept. You will not receive my response when I have not received yours. This has happened many times in the past so please do pay attention when sending email.
When you send me email, always include
"CS650 last-name, first-name, your subject" in the subject line. Use NJIT email address for course related communication. No verification/identification of the sender, no email communication. No anonymous email addresses please. Anonymous email from yahoo, gmail, hotmail, etc. will highly likely be quarantined by the NJIT spam filter. I get hundreds of spam mail every day and it's not realistic for me to go through all these quarantined email to check yours. Please, help me to help you by adhering to the email policy above.
- Wed, 12/10/2008, 5:00 pm: Final exam is comprehensive as indicated in FAQs.
- Wed, 12/10/2008, 4:50 pm: Procedure for Truning in Project Report: come to my office GITC4209, 5-6 pm, Monday, 12/15/2008:
- If your project status is the same as last Thur night, no need to demo. Just turn in your report and CD.
- If your project didn't work properly last Thursday night but now you got it working, be prepared to demonstrate with three physical/virtual machines. Turn in your updated report and CD.
- If you finished the extra credit problem, be prepared to demonstrate with three physical/virtual machines. Extra credit problem (10% of the course) must meet the following requirements as repeatedly emphasized in class through out the semester:
- insert a field, called myrank which is a random integer between 0 to 99,
- sort the hits on myrank and the date the document was created or last modified,
- display the date and myrank next to each hit using jsp
- must have at least 1000 pages crawled per machine. No demonstration, no extra credit.
- Whatever the status of your project maybe, you have to demonstrate the distributed/parallel search capability on at least three machines, preferrably physical machines.
- If you can't make the date and time, email me for another time.
- Mon, 12/1/2008, 6:00 pm: Procedure for the project:
Hardcopy project report must include the following in the given order (keep it short):
- Team member names, list of the files you modified. Do not include those that are not yours.
- Step-by-step instructions that show that anyone can reproduce exactly what you did.
- List of sites you crawled and their nature.
- Show the number of pages you crawled and indxed on each backend.
- Screen shots of Luke on the index crawled for both backend machines. Must show the nature and some statistics of the crawled data.
- Screen shots of the sample search results displayed on the frontend machine.
Softcopy project report:a CD containing the report described above, crawl data, and tarred gzipped nutch directories of the three machines.
Demonstration: the order will randomly determined.
- Set up a cluster of 3 machines (or VMs). I will bring a router and three cables. If possible, please bring your own router and cables to set up independently. This will expedite the entire process.
- Run Luke to demonstate the nature of the pages crawled. Show the number of pages you crawled and indxed on each backend.
- Perform search on the frontend that will trigger the backend servers.
- Display the search results and explain where the pages are from.
- Run Luke on each backend machine to demonstrate that the pages are indeed from the designatged backend server.
Extra credit (10%):The same as above and search results displayed with rank and date in the beginning of each hit.
- Thur, 11/13/2008, 1:55 pm: Test 2 statistics: 0-39:5, 40-49:3, 50-59:3, 60-69:8, 70-79:3, 80-89:2, 90-100:0. Avg 57.1.
- Thur, 11/13/2008, 1:50 pm: Test 1 solutions
- Fri, 10/10/2008, 5:00 pm: Test 1 statistics: 0-29:3 30-39:2, 40-49:4, 50-59:3, 60-69:6, 70-79:3, 80-89:2, 90-100:3. Avg 57.7.
- Thur, 8/21/2008, 4:30 pm: the class web page posted.
The contents of CS650 Web site will change from time to time to improve the quality of this course.