External factors of page ranking.
Why use accounting for external links to the site
All factors affecting the ranking are under the control of the page author. Thus, it becomes impossible for a search engine to distinguish a really high-quality document from a page created specifically for a given search phrase or even a page generated by a robot and not carrying any useful information at all. Therefore, one of the key factors when ranking pages is the analysis of external links to each page being evaluated. This is the only factor that is not controlled by the author of the site.
It is logical to assume that the more external links there are to the site, the more interesting this site is for visitors. If the owners of other sites in the network put a link to the estimated resource, then they consider this resource to be of sufficient quality. Following this criterion, the search engine can also decide how much weight to give a particular document.
Thus, there are two main factors by which the pages available in the database of the search engine will be sorted out at issue. This is relevance (that is, to the extent that the page in question is related to the subject of the request – the factors described in the previous section) and the number and quality of external links. The last factor is also called a reference citation, link popularity or citation index.
The importance of citations (citation index)
It’s easy to see that just counting the number of external links doesn’t give us enough information to evaluate the site. Obviously, the link from the site www.microsoft.com should mean much more than a link from the home page www.hostingcompany.com / ~myhomepage.html that is why it is impossible to compare the popularity of websites only by the number of external links – it is necessary to take into account the importance of links as well.
To assess the number and quality of external links to the site search engines introduce the concept of citation index.
Citation index or CI is a General designation of numerical indicators that evaluate the popularity of a resource, that is, some absolute value of the importance of the page. Each search engine uses its own algorithms to calculate its own citation index, as a rule, these values are not published anywhere
In addition to the ordinary citation index, which is an absolute value (that is, a specific number), the term weighted citation index is introduced, which is a relative value, that is, it shows the popularity of this page relative to the popularity of other pages on the Internet. The term “weighted citation index” (VIC) is commonly used in relation to the Yandex search engine.
When ranking search results, great importance is attached to the text of external links to the site.
Link text (or other anchor or reference text) is the text between the “a” and “/a” tags, that is, the text by which you can “click” the mouse pointer in the browser to go to a new page.
If the text of the link contains the necessary keywords, the search engine perceives it as an additional and very important recommendation, confirmation that the site does contain valuable information corresponding to the topic of the search query.
The relevance of referring pages
In addition to the reference text, the General information content of the referring page is also taken into account.
Example. Suppose we promote a resource for selling cars. In this case, the link from the car repair site will mean much more than a similar link from the gardening site. The first link comes from thematically similar online, so it will be largely appreciated by the search engine.
Google PageRank-theoretical foundations
First, who patented the system of account of external references was the company Google. The algorithm is called PageRank. In this Chapter, we will talk about this algorithm and how it can affect the ranking of search results.
PageRank is calculated for each web page separately, and is determined by PageRank (citation value) of the pages referring to it. Kind of a vicious circle.
The main task is to find a criterion that expresses the importance of a page. In the case of PageRank, the criterion was chosen as the theoretical attendance page.
Consider a model of the user’s journey through the network by cross-links. It is assumed that the user starts viewing sites from some random page. It then navigates to other resources through the links. At the same time, there is a possibility that the visitor will leave the site and again start viewing documents from a random page (In the PageRank algorithm, the probability of such an action is taken 0.15 at each step). Accordingly, with a probability of 0.85, he will continue the journey by clicking on one of the links available on the current page (all links are equal). Continuing the journey to infinity, he will visit the popular pages many times, and little known — less.
Thus, the PageRank of a web page is defined as the probability of the user being on this web page; the sum of probabilities for all web pages of the network is equal to one, since the user is necessarily on any page.
Since it is not always convenient to operate with probabilities, after a series of transformations with PageRank it is possible to work in the form of specific numbers (as, for example, we used to see it in Google ToolBar, where each page has a PageRank from 0 to 10).
According to the model described above, we obtain:
– each page in the network (even if there are no external links to it) initially has a nonzero PageRank (though it is very small);
— each page that has outbound links, transfers a part of its PageRank to pages to which refers. In this case, PageRank is inversely proportional to the number of links on the page
– the more links, the less PageRank is passed on each;
— PageRank is passed not fully, at each step, the fade takes place (the same probability 15%, when the user begins viewing with new, casually chosen, pages).
Let’s consider now how PageRank can influence the ranking of search results (we say “can”, since PageRank has not been involved in the Google algorithm for a long time, as it was before, but this is lower). With the influence of PageRank everything is very simple – after the search engine has found a number of relevant documents (using text criteria), you can sort them according to PageRank – as it is logical to assume that the document, which has a greater number of high-quality external links, contains the most valuable information.
Thus, the PageRank algorithm “replaces” the documents that are most popular without a search engine.