DOI:10.20894/IJWT.
Periodicity: Bi Annual.
Impact Factor:
SJIF:4.78 & GIF:0.428
Submission:Any Time
Publisher: IIR Groups
Language: English
Review Process:
Double Blinded

News and Updates

Author can submit their paper through online submission. Click here

Paper Submission -> Blind Peer Review Process -> Acceptance -> Publication.

On an average time is 3 to 5 days from submission to first decision of manuscripts.

Double blind review and Plagiarism report ensure the originality

IJWT provides online manuscript tracking system.

Every issue of Journal of IJWT is available online from volume 1 issue 1 to the latest published issue with month and year.

Paper Submission:
Any Time
Review process:
One to Two week
Journal Publication:
June / December

IJWT special issue invites the papers from the NATIONAL CONFERENCE, INTERNATIONAL CONFERENCE, SEMINAR conducted by colleges, university, etc. The Group of paper will accept with some concession and will publish in IJWT website. For complete procedure, contact us at admin@iirgroups.org

Paper Template
Copyright Form
Subscription Form
web counter
web counter
Published in:   Vol. 1 Issue 1 Date of Publication:   June 2012

A Novel Approach for Web Crawler to Classify the Web Documents

L.Rajesh,V.Shanthi, E.Manigandan

Page(s):   5-7 ISSN:   2278-2397
DOI:   10.20894/IJWT.104.001.001.002 Publisher:   Integrated Intelligent Research (IIR)

Web Crawler is a program used to download documents from the internet. It visits many sites to collect information that can be analyzed and mined in a central location. Focused crawler is designed in such a way that it gathers document on a specific topic. To index a document URL, the Focused crawler should ensure that the document which is under the review is belongs to the specific topic. To identify the relevancy of the particular web page content to coincide with the context specific topic and to avoid the replication of the information, the authors of this paper suggests the application of Cosine Similarity measures after removing the stop words from the web page contents. To implement the above mentioned strategy, we have followed this procedure which will effectively identify the web pages containing relevant web contents of specific topic. First we have to create a context specific dictionary consisting of terms related to the focused topic. Then we consider the two web page document namely A and B. We then remove all the stop words from document A and document B. Then we count the number of words available in the each document. Next, we constructed a matrix for each web page to calculate the frequency of each word appearing in the web document. Then we calculate the Cosine Similarity measure between the two matrices constructed out of the two web documents. This approach not only governs the Frequency of a particular word appearing in the web document but also look at the word belongs to the context specific dictionary which we created at the initial stage. Thus we conclude that this paper will provide an efficient mechanism for the Focused crawler to index a web page which is more relevant to the topic