The Webimax blog went through a redesign recently, and one of the sections that we were happy to add was an ask the experts section, where people can submit questions about online marketing, and get an answer from one of the many professionals at Webimax.
The following question about redirecting URLs wasn’t one asked specifically about a questioner’s site, but rather a practice that they’ve seen on the Web, and we thought the answer was worth sharing with a larger audience. If you have any questions about online marketing, please feel free to let us know.
I know of a company that is redirecting hundreds of vanity URLs (permanent redirects) to different pages within one single, large website. They are doing this for marketing purposes. They are not duplicating any content. Does this practice, directing several domains/urls to other pages within one site affect SEO? From what I’ve found, it doesn’t do anything to help SEO but I want to ensure that it is not hurting.”
To answer a question like this, it’s almost necessary to step into the mind of a search engineer since it’s more about the reaction of search engines to a specific practice than it is about how someone optimizing a site might work to market that site.
The purpose of a permanent redirect is usually to tell visitors and search engines that a page has moved to a new address, and to deliver those visitors to that address. In most cases, the fact that it is a permanent redirect sends a signal to search engines that they should transfer PageRank and Hypertext relevance (the relevance that anchor text might pass along) to that new address from any links that are pointed to the old address, but they don’t do that automatically.
Sometimes permanent redirects are used to anticipate some searches as well, such as Disney registering and redirecting mickeymouse.com to an internal Disney page on their website.
Google’s servers are constantly reviewing the structure of a link graph of the Web, and looking for anomalies in it, such as where they might see a lot of low value pages (low PageRank) clustered together that all tend to point only to one other page, which then might point to another page in what appears to be an attempt to manipulate PageRank. They refer to that as a “dense subgraph” on their link graph, and it’s one way for them to do things like uncover doorway pages and private blog networks.
That type of linking together of pages can send off a flag or alert to do more research of those pages, and the links between them to look for other webspam as well.
The type of activity that you describe, where lots of permanent redirects might be used to point to one site, in what might be an attempt to pass along anchor text from the links being used, could also potentially send a flag or alert to the search engines for an automated or manual review as well. It’s possible that such activity could have some benefit to the site in question, at least initially.
But it’s also likely that it is unusual enough to lead search engines to ignore any value those redirects might pass along or could lead to results such as a warning from Google (about unnatural links, in Google’s Webmaster Tools) and/or result in a penalty.
We do know that when Google crawls the web, and collects URLs that it finds on pages, it does note things like whether or not those URLs go through redirects, and where those redirects lead to. There are at least a couple of patents from Google that describe such practices. Those patents also state that Google usually puts all of the 301 redirects to the side initially, and follows other links during a crawl. The permanent redirects are likely later analyzed in more detail before they might be placed into a queue for crawling.
There are also some costs associated with such a practice that may not be seen as quite as direct. The cost of including a lot of redirects on a site (temporary or permanent) is that they can make the pages of that site slower, since the handling of a redirect requires some extra processing work on the part of a server. That’s why Google’s PageSpeed tool recommends to site owners that they minimize the redirects found within their site.
Google’s Matt Cutts also admitted in an interview about 2 years ago (in a followup email) that the use of internal redirects on a site can diminish the amount of PageRank distributed through that site because of the redirects – they just don’t pass along as much PageRank. That can mean that the amount of PageRank distributed throughout the site as a whole can be less than it should be, and that all pages end up not ranking quite as high as they could.
So, having some domains redirect to another site, like mickeymouse.com redirecting to http://disney.go.com/mickey/#/home is probably fine. Having hundreds of permanent redirects pointing to the same site can potentially stand out, and may lead to closer scrutiny from the search engines.
For instance, if the sites being redirected never existed as stand alone sites (which could have been acquired), and appear to have been redirected primarily to manipulate anchor text and PageRank, those types of signals may result in penalties from the search engines.
Thanks for asking.
- Site Architecture
- Technical/Server Issues
- HTML Use/Analysis
- Content Review
- Negative Practices
- Webmaster Tools
- Social Media
The following is a high level checklist of issues that should be explored on a site. We will be drilling deeper into each part of this checklist in the weeks to come, providing examples and tips and suggestions for speeding up how these things are checked upon, and expanding upon the checklist itself as we go along. We will also be shuffling around these issues to prioritize them, and indicate which things should be checked first, and so on.
We will also try to include sources and documents about these different issues as well.
If you have any suggestions, questions, or advice to add, please either let me know directly @chriscountey, or use the comments.
Canonical URLs (Best Page Addresses)
- Access to pages on domain (www vs. non-www)
- Home Page linking consistency
- Capitalization/Lower Case (capitals in domain name ok, in folders and files a potential problem)
- Print Versions (CSS Rather than crawlable duplicate PDFs/Docs
- Canonical Link Elements – do they match up right?
- Rel Prev/Next link elements for paginated pages?
- Internal Redirects (internal 301 redirects avoided)
- Correctly formatted
- Includes all it should (including cart pages, email referral pages, login pages)
- Includes link to XML sitemap or XML Sitemap Index
Meta robots noindex/nofollow
- Used Appropriately
- Used on pages that a deep crawler might try to index (like form and search results pages)
Category/Site Structure (URLS and Information Architecture)
- Unique and User Friendly
- Use of appropriate category and sub-category link structures
- Customer orientated rather than feature orientated
- Provides tasks/Options for different personas
Choosing File Names
- Uses hyphens as word separators
- Avoids Keyword Stuffing
- If file names to be changed, links on site changed, and 301s set up for external visitors
Custom Error Page
- Sends proper 404 code status
- no soft 404s
- Helpful to visitor (navigation, directories, search)
- Organized into user friendly and user oriented categories
- Provides links to most important pages
- Avoids using too many links
- Doesn’t include 404s or links that redirect internally
- Properly formatted (XML proper encoding)
- Uses only canonicals
- No 404s and no internally redirected pages
- Submitted to GWT and Bing Tools
Server Status: Messages 200, 300, 400, 500
Secure Server | HTTPS Protocol
- No error messages
- No https bleed-over to pages that aren’t supposed to be https
- No certificate authority errors
Search Friendly Links
- All links to be indexed reachable by text-based links or “href” and “src”.
Broken and Redirected Links
- Broken links identify, links removed or replaced
- All 301 redirected links replaced with direct links
- Checked for broken links and redirects and replaced where appropriate
- Pages linked to checked for repurposed content
- Internally (see canonical section above)
- Mirrors identified and disallowed/noindexed as appropriate
- Substantially duplicated content on self-owned other sites removed/changed/blocked
- Substantially duplicated content on other sites removed (friendly email, AUP letter to host, DMCA)
- If Ajax is necessary, is Google’s hashbang approach used?
- Avoid session IDs in URLs
- Avoid excessive multiple data parameters in URLs
- Avoid excessive processor calls
- Avoid calls to multiple servers as much as possible
- Avoid keyword insertion pages (pages were the content is substantially the same except for keywords that are inserted into the pages).
- Keep boilerplate (disclaimers, copyright notices, other text that appears on most pages) that exists on templates light.
- Label page segments semantically well (the div class for those could be things such as header, footer, sidebar, advertisement, or whichever is most appropriate.)
Page Load Times
- Images compressed for right dimensions and for file sizes?
- GZIP or Deflate used?
- Base 64 encoding for images avoided?
- Long browser caching dates?
- CDN in use where appropriate?
- Other Page Speed considerations
– Navigation of indexable pages possible without accepting them?
Deprecated HTML/HTML Validation
- If invalid, are errors the type that will harm SEO?
Cascading Style Sheets (CSS)
- If invalid, are errors the type that will harm SEO?
- Relevant to the content of the page and be keyword-rich.
- Meaningful and able to stand on its own as a description of the page it titles.
- Persuasive and Engaging to those who see it out of context
- As unique as possible compared to other titles on the site
- If the name of the site appears in the title, it should be at the end of the title, and not at the beginning, unless it is the home page.
- No more than ten words or roughly 60-70 characters in length.
- Unique if possible compared to titles from other sites.
Meta Description Elements
- Descriptive of the content of the page
- Includes the main keyword phrase the page is optimized for
- Engaging and persuasive to viewers who see it out of context (search snippets or social shares)
- Around 25 words or 150 characters in length
- Well written sentences, using good punctuation
- One sentence preferable, but two alright if keywords are in the longer sentence
- Preferable to have keywords as close to the start as appropriate
- Top level heading should describe the content of the page
- Lower level headings should effectively describe the content they head
- One top level heading preferable per page
- Headings should be used like headings in an outline, in proper order
- Main and subheadings can, and should contain targeted keywords if possible and appropriate.
- A heading element should not be used for the page logo
- Headings for lists and sections in page navigation should use CSS to style them rather than heading elements.
- For bold text, use the “strong” HTML element.
- For Italics text, use the “em” HTML element
- Use Strong and Em to highlight the use of keywords and related words
- When bolding or italicizing other text on a page, use CSS to style how it looks
- Don’t over use bold or italics – emphasizing too much means emphasizing nothing.
- Use alt text for images on a page that are meaningful
- Use captions for images on a page that are meaningful
- A caption for an image should be contained within the same HTML element as the image (like a div)
- Select images that are meaningful that are related to the keywords optimized for
- Use the chosen optimized keywords in the alt text and captions where appropriate
- Use file names that reflect those keywords where appropriate.
- Use hyphens to separate words in image file names.
- Use alt=”" for images that aren’t meaningful like decorations or bullet points
- Use alt text for logos that are descriptive of the business or organization
- Larger images with better resolution might be ranked a little better than smaller and lower resolution images.
- Alt text should not be a list of keywords, but can contain a keyword phrase.
- Alt text shouldn’t be more than 10 words or so.
- Avoid keyword stuffing alt text, captions, and image file names.
- Keywords should be used in anchor text
- If the keywords for a page being pointed to aren’t used, related terms should be
- Anchor text used in navigation should be descriptive of what is on the page linked to
- Anchor text should not use generic terms such as “click here.”
- Anchor text shouldn’t be longer than 10 words or so if possible
- Anchor text shouldn’t be stuffed with multiple keywords
Meta Data optimization
- Search engines do not use Dublin core meta tags
- Search engines do not use the revisit meta tag
- A robots index, follow tag is unnecessary and redundant
- a NOODP will keep Google and Bing from using Open directory project titles instead of title element titles, if the site is even listed in DMOZ
Amount of Text
- Having some minimum amount of text on a page (200 words?) gives search spiders something to index.
- Possible quality signal
- Important to credibility
Keyword Use in Copy
- Are keywords chosen for a page being used in page titles, meta descriptions, headings, and content
Keyword Prominence/Visual Segmentation
- How well does the HTML code of a page show how it’s broken down into different blocks (heading, main content, sidebars, footers, etc.)
- Are keywords used in the different sections, and especially in the main content area of pages?
Use of Related Words/Phrases
- Some words tend to co-occur on pages ranked highly for a certain query (or categories of results for queries), and it can help in the rankings for a page to use some of those phrases.
Is there a loss in traffic that corresponds to one of the Panda or Penguin updates?
- Is there text on pages in the same font color as the background?
- Is there text on pages hidden through an offset div?
- Is there a large amount of text on pages in small iframes or CSS scrolling overflows
- Is there text in a font color that matches the font color as the page background that might be mistaken as hidden text?
- Does the site use cloaking to show search engines one thing and visitors something else?
- Are meta refreshes used instead of redirects, and if so might they be used in a way which might deceive search engines?
Outward Links/Link Exchanges
- Is the site using link directory pages that promise being listed in exchange for a link?
Keyword Research, Selection and Implementation
- Are relevant, competitive, appropriate and popular keywords being used on the pages of the site?
- Are those keywords being used effectively on those pages?
Keyword Focusing | Mid- to Long-Tail Key Phrases
- Do the main pages of the site focus upon more competitive keyword phrases?
- Do deeper pages with less pagerank focus upon long-tail phrases?
Google Webmaster Tools/Errors Analysis*
- Has the site been verified with GWT?
- Has a choice of “www” setting been made? (Doesn’t have to be if domain access issues are addressed)
- Has a targeted country/location been selected? (Doesn’t have to be)
- Have any errors listed been checked upon?
Social Media Audit | Status
- Does the site integrate appropriate social sharing buttons?
- Do the pages of the site provide links to social profiles for the site?
On-Site Social Engagement
- Does the site provide ways to give feedback to the site owners?
- Does the site provide a way to leave comments?
- Is there user generated content on the site, such as reviews and ratings, and does it use rich snippets if so?
- Are there public user/member profile pages, and if so how rich are they in terms of features?
- Is there a forum on the site, and if so, some guidelines for its use?
Have analytics been set up for the site?
- Code on every page
Want more? Check out this awesome technical SEO checklist: http://www.seomoz.org/blog/how-to-do-a-site-audit
Maps are not my friend. Or at least they haven’t been in the past, whether lost on the side of a road with a big unfolded paper map that I can’t find myself upon, or driving directions from Google or Yahoo or Mapquest that get me 99% of the way to my destination only to lose me in the last mile. Maps don’t seem to be Apple’s friend lately either, with a lot of negatives hurled their way in recent weeks over their new mapping program that launched with the new iPhone 5.
I picked up a new phone about a month ago, and my favorite part by far is the navigation feature, which helps me with that last mile or two. I get close to my destination, pull over and pull out my phone, tell it where I’m going, and it gives me turn-by-turn directions to my destination. As I arrive, it gives me a street view image of where I’m at.
A patent application published by Google this past week is about maps, and about advertising upon maps. This advertising seems geared towards sites that use the Google Maps API to pull in map information to display for one reason or another. The ads would appear directly upon the maps, near or at the location of the advertiser. Bidding on maps might be based upon both the display region where the ads would be shown, and the zoom level of the map. The publishers of the maps would decide whether or not they wanted to include advertisements on their maps as well.
Imagine that you own a chain of movie theatres, and you’re using Google Maps on your site to let people see the locations of your theatres, you might decide that having ads on your maps might actually help traffic to your theatres, and also possibly earn you a few dollars.
The patent appears to be on a fast track, having been filed just this past June, and published after a few months.
The patent is:
Online Map Advertising
Invented by Brandon Badger, James E. Payne, Mike Perrow
Assigned to Google
US Patent Application 20120239509
Published September 20, 2012
Filed: June 1, 2012
Systems and methods for selecting advertisements for presentation in a map space are disclosed. Map requests are received, map spaces identified, advertisement bids are received for advertisement space within the map spaces, and advertisements are selected for presentation in the map space based on the advertisement bids. The advertisement bids can be selected through an auction.
Pages I enjoyed this past week:
Since the introductory topic of this post involved Google Maps, I’m going to stick to that theme here and point out a couple of Maps related posts that I found interesting.
The first of those questions the value of one of the signals that has been known to be important to the rankings of businesses in Google Maps. Mike Blumenthal asks the question, Will Citations Stop Being Effective for Local Optimization in the Future? Mike’s answer is a thoughtful response worth spending some time with.
If you like the more technical side of SEO and local search, Mike also has a post that points out a number of the different robots that Google uses in building maps, in his post, MapMaker Bots and What They Do.
You’ve probably heard an increasing mantra in SEO circles about building great content for your pages. AJ Kohn gets a bit philosophical and practical on this topic in his post Stop Creating Great Content and Produce Memorable Content Instead. Ask yourself how you put a stamp on something that you’ve written so that people not only find the information you provide to be useful and helpful and engaging, but also associate it with you, and with your brand, so that it has a lasting impact and effect upon them.
Finally, a hat tip to David Dalka for sending me an email about this post from the search engineers at Yandex titled, A Model of the Fresh Internet, in which they describe a new approach for identifying and crawling fresh new content on the Web. If you’re into technical SEO, you may see the implications of their approach and how something similar might be used by other search engines as well.
Retro Post of the Week
So, I wanted to continue the Maps theme for this post with this selection, and dug back into the archives of my site to pull out Authority Documents for Google’s Local Search.
How does Google know which page to associated with a particular listing in Google Maps?
The patent filing Authoritative document identification explores some of the different signals that the search engine might use to associate a particular site with a particular location. Google Maps was one of the first Knowledge Bases that Google built, and has been working upon for years.
If Apple’s Maps have come under a considerable amount of scrutiny, in part it’s because Google has a considerable head start. Rumor has it this morning that Apple may try to make up for that by hiring some former Google Maps employees.
So how do you feel about the possibility of advertisements shown directly upon Google Maps.
This summer, Google announced that they were coming out with a program called Google Now, which seems to be Google’s answer to Siri. As a digital assistant, it anticipates your informational and situational needs almost before you do.
The original Siri patent, Intelligent Automated Assistant, is filled with details on different options it might include in the future, but focuses primarily upon what Apple calls an active ontology that can understand what types of related information people might want to find out more about when focusing upon different topics.
For instance, within the domain of “restaurants,” Siri might anticipate questions about which restaurants are nearby, it might pull up reviews for restaurants, or help to book a reservation, or show a menu. Google Now’s take on the concept of intelligent automated assistant is a little different.
Where Google Now differs is that it attempts to learn from and understand human behavior. A head-to-head comparison of the newest version of Siri versus Android’s Jelly Bean Voice Recognition program at PC Magazine keeps on bringing up Google Now as a feature that distinguishes the two programs, in a positive manner.
Google was granted a patent this week that describes the predictive algorithm behind Google Now that learns from its owners’ behaviors. It can determine where you live and where you work, and can offer alternative routes to or from work if there’s road congestion on the route you usually take.
It can learn about your Monday night bowling league and that you like watching certain TV shows, and add both to your calendar for you. It can learn what your favorite sports teams might be, and that you like looking at the scores from games in the morning with breakfast. It remembers that you like stopping at a certain coffee house for breakfast most Tuesdays, and that you usually drop your clothes off at the dry cleaners on your Friday night drive from work.
The patent is:
Providing digital content based on expected user behavior
Invented by Sumit Agarwal, Dipchand Nishar, and Andrew E. Rubin
Assigned to Google
US Patent 8,271,413
Granted September 18, 2012
Filed: November 25, 2008
In a computing system, information regarding a plurality of events that use a computing device is obtained, and a time-dependent increase in activity for each of at least some of the events is identified. An observed interest by a user in an event is correlated with an identified increase in activity for the event. Information about the activity at a time related to the event is provided for review by the user.
Among the inventors is Andy Ruben, the co-founder and former CEO of Android Inc, and the Senior Vice President of Mobile and Digital Content at Google. Dipchand “Deep” Nishar was the Director of Wireless Products at Google where he help start Google’s mobile offering, and now works as a Senior Vice President, Products & User Experience at LinkedIn. Sumit Agarwal was Head of Mobile Product Management at Google for a little more than a year, and his LinkedIn profile tells us that he and his team launched “20+ features in various Google mobile products.”
The patent describes a number of different types of activities and user behaviors that it might see from its owner, and learn from. Some external signals might be used to predict future user actions, including user requests and communications made while using a computing device. Some user behaviors might be learned via sensors and GPS.
Many of these activities might be used to provide digital content. Siri will tell you the score of the Washington Nationals baseball game when you ask for it. Google Now will notice that you look up the score every morning after a game, and will start showing you the score before you ask for it.
The patent provides a very detailed description of the kinds of things it might learn, and how it might provide content in response to what it’s learned from user behavior signals. It includes a wide range of examples as well. For instance, it might potentially receive data from a payment processing service provider to learn where you’ve stopped to purchase coffee or where you’ve stopped to buy gasoline, and generate a timeline based upon such purchases and the places you’ve visited.
Take a turn on your trip to work in the direction of that coffee house, and it might provide suggestions on the route to the coffee house based upon traffic conditions, or provide other information.
It could also present a coupon for that particular coffee house while you’re on your way, or possibly even from another one that is along the same route, before you arrive.
This system might notice that you like to attend baseball games at the local stadium every so often, but that you only go to games when the local team was playing a particular opponent, by checking the team’s schedule. This might tell Google Now that you’re more of a fan of the opposing team than the local team.
It might present you with a coupon for a restaurant near the stadium about 2 hours before the next game against that opponent if you’ve been consistently going to games involving that team.
I’ve provided a really short and high level overview, but the patent is much more detailed, and is worth spending some time with to understand the difference between the helpful Siri, and the predictive Google Now.
Google has also published some very related pending patent applications, such as Providing Results to Parameterless Search Queries
A parameterless search query might be as simple as someone shaking their phone a number of times (shake once, or shake twice), pressing a button for a certain amount of time, or even providing a command such as “search now.”
In response to that parameterless query, the mobile computing device might take cues from the context around it to provide an answer.
These cues could include information associated with the device and with the user, such as the time of day, upcoming and recent calendar appointments, direction and rate of speed that the device is traveling, a current geographic location, and even recent device activities such as an email being sent to someone about a meeting scheduled in half an hour.
Another related patent application is Activating Applications Based on Accelerometer Data.
Under this patent, we learn that certain accelerometer profiles associated with different types of movements at different points of a day might indicate a preference to see certain types of digital content.
Someone who likes to go for jogs in the morning might like their phone to play music, or they may like to see news on it during a commute, manage email communications at an office, and view calendar information on the walk from a parking garage to the office. Different profiles might automatically call up applications you typically like to use.
Google is incorporating user behavior, context information, sensor information, and more to anticipate the needs of users, and predict the kinds of information and the applications that might be appropriate for the people using those devices. That seems to pretty useful in a personal assistant.
On Monday night, I had the chance to give a presentation for the Agile SEO Meetup at the Webimax headquarters in Mt. Laurel, New Jersey. There was a nice turnout, including many Maxers* who started their day early, and stayed late or returned for the presentation. It takes a team to make a meetup work, and I wanted to express my thanks to the Maxers who set up the presentation equipment and the live-streamed webinar, who set up tables and chairs and signs, who made sure we had something to drink, and who helped promote the event. I’m not going to mention names, because I’m sure I’d leave someone out, but you know who you are.
The topic is one dear to my heart, since many of the issues I often see on websites involves how pages of a site might be crawled by search engines. The title for my presentation was, “Everything you wanted to know about crawling, but didn’t know where to ask (Including Importance Metrics and Link Merging)”. OK, I got carried away with the title, but it seemed to fit what I wanted to talk about.
Here’s the slides from the presentation:
I wanted to share some behind the scenes thoughts about the presentation with this post.
I mention the robots mailing list early on, and one of the things that amazes me is the role of the then very young Martijn Koster in spearheading something like the Robots Exclusion Standard and the robots.txt file that we all know and love. If you look at my slide that shows a Usenet message from him, you’ll see he contacted an interesting group of people to work on some way of lessening the impact of crawling programs on Web pages. They include, among others, Jonathan Fletcher, who invented Jumpstation one of the earliest modern search engines. Another name on that list is Guido van Rossum, who invented the programming language Python, and who presently works for Google.
I didn’t include a link to the robots.txt pages on the presentation, nor to the specification that Google follows with robots.txt. Bing also includes a lot of information about their implementation of robots.txt for Web pages. Incidentally, here’s Bing’s robot.txt file, Google’s robot.txt file, and Yahoo’s robot.txt file if you’re curious about how they are doing it. Why is Yahoo’s so much simpler and shorter?
Ok, I’m not sure how many of you knew that Google was granted a patent on a politeness protocol for robots visiting web pages so that they wouldn’t overwhelm them (like a distributed denial of service attack), but it surprised me when I first came across it. Incidentially the picture I included of sliced wires was one my webhost included on a page apologizing for down server time when the connection to their data center had an incident with a backhoe. They’ve since added an additional alternative set of lines to reach the world with in case such a catastrophe happens again.
I have a slide showing Lawrence Page and a paper that he co-authored about web crawling and importance metrics (pdf) associated with it. At one point in time, there was a page on the Stanford.edu website that listed about 10 or so papers describing technology that Google was based upon, and this was one of the papers included. That page no longer seems to exist, and I don’t remember the URL. I should have saved a screenshot of it when it was around, but it’s too late to do that. If you’re interested in some of the approaches that Google follows when crawling the Web, it’s a good starting point. You can see from it that Google would rather index a million home pages than a million pages on one site.
The patent I pointed to on Anchor tag indexing in a web crawler system adds some additional information about how Google crawls pages.That one was originally filed in 2003, so things have likely changed considerably.
My “subliminal advertising” slide didn’t elicit any laughter, but I did see a couple of smiles.
Google’s webmaster tools do stress that a web site owner should build pages that would work well in an early browser like Lynx, but there are a lot of hints that Google has the capacity to view pages with much more sophisticated browsers. The question is though, do they do that when it’s potentially very expensive from a computational stance?
The link merging slide contains information from a Microsoft patent that describes how they might be performing a Web Site Structure Analysis. What really interested me was how they might merge links they find on pages, such as links at the top of a page in a main navigation, and links at the bottoms of pages in a footer navigation. What kinds of implications might this have for pages that are linked to multiple times on the same page?
The remainder of the presentation points to some features introduced by the search engines that webmasters can use to try to help the search engines better understand the structures of their sites, such as canonical link elements, hreflang elements, “prev” and “next” link elements, and XML sitemaps. They can be helpful if used correctly.
My next-to-last slide includes a link to a Yahoo patent filing that tells us that they would consider looking at links in social media signals to find pages discussing hot topics, and provide answers to very recency sensitive queries.
Not included in the slides was a mention I made of a very recent paper that describes how a focused crawl of web pages that also looks at text around links might help identify some of the sentiment about those links. The paper is Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers. One of the authors of that paper has since moved on to Google, and brought some of his expertise on that topic with him.
Chris Countey also gave a presentation on Monday about some of the trusted sources that he looks to in keeping up with SEO, and he’s going to posting about his presentation sometime soon, so keep an eye out for that.
* Added 2012-09-13 at 1:13 pm (eastern) – Maxers is my name for the team at Webimax, and one that I coined while writing this post. Any resemblance to the term mozzers to stand for the people at SEOmoz is purely intentional.
In the start of every football season, the first few games are often a surprise in terms of how well or how poorly some teams play. One team yesterday, picked by many to be a potential Superbowl, participant barely eked out a victory over a rival that isn’t expected to fare quite as well. Other teams won by wide margins over teams expected to be much more competitive.
A football team goes through a lot from the end of one season to the beginning of the next, from drafting young players, to signing or losing free agents, and sometimes even through coaching changes, adoption of new strategies and approaches, personnel changes in front offices, and more. There are always teams that emerge out of nowhere to win more than expected, and other teams that don’t live up to pre-season hype.
There are a lot of eyeballs on those teams and their players, from the press to the front office, fans to fantasy team owners, professional scouts and amateurs who prognosticate in forums. There are many ways that someone can judge the talent on a team, and their chances of winning, including draft day evaluations, unofficial scouting reports, reporter’s head-to-head evaluations.
This stage of the football season reminds me of how we often evaluation websites and how well they might rank in search results. Rankings are often based upon a combination of information retrieval (IR) score involving how relevant a page might be for a particular query, and importance scores such as PageRank. There are other factors as well that come into play. For instance, if a query includes the name of an entity, and a search engine as associated a particular website with that entity, it might rank well for that query even if it might not be the page or site with the highest combination of information retrieval score and important score. But, because of the association, it might even be listed multiple times at the top of a set of search results.
There are other methods of ranking and re-ranking search results that may cause other pages to rank above where we might think they should based upon IR score and Importance score. For example, Google will sometimes include localized organic results in rankings for pages as a way to make those results more relevant for people living in particular areas. So, on a search for “hospital” for instance, one or more of the top ten results you see might be for a local hospital, even though it doesn’t have the highest IR and Importance scores.
What can be even more challenging is that regardless of those different re-ranking approaches, we don’t know how much of a difference in scores there might be between a results showing up as the top result for a query, and the second result, or the third result, or so on. We don’t know if the top result is a potential Superbowl contender, or just a little better than the result immediately below it. And we don’t always know whether some special re-ranking factor is in play that might raise it to the top.
When we work on a site to improve its quality, and make it more relevant for a particular query, we can’t be certain how much an improvement we might see by changing titles to make them more descriptive and more engaging. We don’t necessarily know the impact of adding more quality content to a particular page that might be relevant to a specific term. We need to make those changes, and believe that by improving the quality of a page, we create the possibility that our efforts will result in more traffic, a higher ranking, and a better experience for visitors to a page. Often we need to be patient and wait to see what types of results such changes have.
A football team taking steps to improve from one season to the next works on their approach to drafting players, attempts to make smart choices in signing free agents, tries to hiring great coaches, runs a smart training camp, and puts together a playbook that takes advantage of the strength of its players and the weaknesses of opposing teams.
Likewise, when someone does SEO on a site, they try to meet the objectives of a site owner, understand the audiences for a site, and make it easier for the two to mutually benefit from being able to find one another. Smart SEO builds the foundation for that kind of engagement by building a strong foundation for a site so that it can be easily crawled and indexed by a search engine, uses the words that searchers interested in what the site owner offers on the pages of the site, and provides a good user experience once they arrive on those pages. The first step in any SEO campaign is in preparing a site to be able to compete.
Pages I enjoyed this past week:
One of the really eye-opening articles that I saw last week echoes my feelings about what Google is building with Google Maps. The Atlantic’s article, How Google Builds Its Maps—and What It Means for the Future of Everything provides a look at Google Maps as both a challenge for Google, and a tool that has greater implications that just making it easier for us to find businesses or get driving directions. Perhaps the most telling aspect of what Google is trying to do with Google Maps comes in this statistic from the article:
In keeping with Google’s more-data-is-better-data mantra, the maps team, largely driven by Street View, is publishing more imagery data every two weeks than Google possessed total in 2006.
While posting to Google Plus last week, I found a presentation that I really enjoyed, and I wanted to make sure that everyone saw an image that I really liked from it. I copied the image, and posted it as an image post rather than a link post, in Google Plus. This way, the post had a large image showing instead of a thumbnail. I included a link to the presentation in the body of the Google Plus post. Funny, but the Google Adsense blog suggested doing the same thing last week, with their post Social Fridays: Use images to deliver a richer experience.
That presentation is Crowdsourcing for Search Evaluation and Social-Algorithmic Search. It’s a long one, but definitely worth spending some time with as it explores ways to crowdsource determining the relevance of search results. Given my intro above about evaluating talent and relevance, it’s interesting to see this presentation by Matthew Lease of the University of Texas at Austin, and Omar Alonso from Bing, which explores alternative ways of evaluating search results. Below is the image from it that I liked so much:
Linking to your own site from a page or blog post can be a very relevant exercise, or it can look like you’re trying too hard to gain relevant anchor text to your own pages. James Mathewson, from IBM, explains approaches that he uses to do that in Three Types of Relevant Internal Links to Boost SEO.
Retro Post of the Week
In this week’s look back at pages and sites that I’ve found incredibly useful and helpful in the past is the Stanford Credibility Guidelines. While I’ve mentioned above about how both football teams and search engines evaluate the talent they have or the relevance of pages in search results, it’s also important to keep in mind how visitors to pages evaluate the businesses and offerings that they see.
The Stanford Persuasive Technology Lab put together the guidelines on how people might judge the credibility of a site, almost a decade ago. I’ve been following these guidelines as much as possible since then, and they include the kinds of things that can really make a difference. You may want to add them to your playbook.