Semantics processing for search engines

Document Type

Conference Proceeding

Publication Date



This paper presents a study of extracting related content based on links on web sites through semantics keyword inputs and content analysis. The Java-based implementation semantically converts keywords in English to potential website names as those done by search engines. But our work doesn‟t work here, we use the URL‟s returned from a search engine to fetch content from website names and to match them with the keywords, through semantics analysis such as Latent Semantics Indexing (LSI). Our differentiator is to extract relevant content from all the sub-links besides the sites discovered by search engines. Our research includes three parts: 1) Let the user input a list of keywords and convert them into a list of URL‟s through search engines. 2) Use a method to match the keyword information and extract the sub-links from the content of the URL‟s. Then save the content in a new list. The content will be filtered through LSI analysis. 3) Create an interface to output the content list to users. Some relevant research is shown in the paper, e.g. PageRank algorithm, Hyperlink-Induced Topic Search algorithm. Link extraction has been done while the LSI part is ongoing.

Publication Title

ACM International Conference Proceeding Series

First Page Number


Last Page Number




This document is currently not available here.