Microdata – Structured HTML

I’m mainly interested in this so that I can make sure my blog posts are properly indexed, and provide additional data for the photographs I publish on this site.

The Basics

Microdata is intended to be applied to the html tags that we already use; div, span, a, etc, etc. We have to mark out Things in our HTML. “Thing” is the terminology used to represent any item, e.g. a photo, a blog post, a location, a local business, etc, etc. The full list of possible Things can be seen here.

Our initial HTML

We’ll think about this by using a simple blog posting, and marking it up as we go along:

<article>
 <header>
  <h1>A Blog Posting</h1>
 </header>

 <div>
  <p>I was thinking the other day about.....</p>
 </div>
 <footer>
  <p>Posted by John Smith on 4/11/13</p>
  <p>Tagged blog, example, test, post</p>
 </footer>
</article>

What Kind of Information Does Our Page Contain?

To mark out our Thing, the itemscope element should be added to the html tag that encloses the thing. In this case our thing is a blog post, as surrounded by the HTML <article> tag.

itemscope should be accompanied by itemtype attribute. itemtype references a page on schema.org that tells the search engine what type of item we’re going to provide data for. In this case, we’re talking about a BlogPosting.

<article itemscope itemtype="http://schema.org/BlogPosting">
 ...
</article>

So now the search engines know they’re looking at a blog post. But what does all the text on this web page mean?

Marking Up Our Data

This is where the itemprop attribute comes in. Different items have different possible itemprops, so you need to look up the correct item at schema.org in order to know which itemprops you can use.

In this case we can mark out the title of the article and the main content text of the article using the headline and articleBody itemprops:

<article itemscope itemtype="http://schema.org/BlogPosting">
 <header>
  <h1 itemprop="headline">A Blog Posting</h1>
 </header>

 <div itemprop="articleBody">
  <p>I was thinking the other day about.....</p>
 </div>
...
</article>

Now for all that footer information. We can use the authordatePublished and keywords itemprops.

The datePublished itemprop is useful to give the search engine an standardised version of the date, whilst a locally formatted version can be visually presented to the user. For instance does 4/11/13 represent the 4th November 2013 as it would in the UK, or 11th April 2013 as it would in the USA? The time tag with the datetime attribute make this explicit to any search engine.

<article itemscope itemtype="http://schema.org/BlogPosting">
...
 <footer>
  <p>Posted by 
   <span itemprop="author">John Smith</span> on 
   <time itemprop="datePublished" datetime="2013-11-04T13:40:50+00:00">4/11/13</time>
  </p>
  <p>Tagged 
   <span itemprop="keywords">blog, example, test, post</span>
  </p>
 </footer>
</article>

Embedding Items Within Items

A little more on the author itemprop. The schema for BlogPosting suggests this should be of the type Person or Organisation. It is acceptable just to use plain text instead, but we could provide a little more information about this author if we wanted. There are likely to be quite a few John Smith’s writing blogs around the world, and we can use this opportunity to specify exactly which one we mean! We can embed a schema.org/Person item to fill out the author information a little more.

<article itemscope itemtype="http://schema.org/BlogPosting">
...
 <footer>
  <p>Posted by 
   <span itemprop="author" itemscope itemtype="http://schema.org/Person">
    <span itemprop="name"><a href="http://example.com/JohnSmith" rel="author">John Smith</a></span>
   </span>
    on
   <time itemprop="datePublished" datetime="2013-11-04T13:40:50+00:00">4/11/13</time> </p>
...
 </footer>
</article>

The addition of a link to the author’s webpage will allow the search engine to link posts from many websites back to this single individual. The link to the author’s page can be identified by the rel=”author” html attribute, although this is really an HTML5 standard rather than microdata. The author’s page might contain links to his or her Google+ or Facebook profile pages so that the search engine knows the author has a verified identity and isn’t just a fictional name, invented to post spam. Public information about the author, such as their profile picture, might also then be included in search results. It’s probably going to be increasingly important that search engines can verify the author of blog posts in order to provide high-quality search results.

Non-Visible Data

I’ve seen quite a few examples using microdata in the HTML meta tag. For example to record the photographer and licence terms of a photograph, but without displaying this data reader of the web page. Generally non-visible data will not be indexed by the search engines, so it seems to me that this is a waste of time (see this article). But I’d love to hear in the comments from anyone that’s had a different experience with this.


COMMENTS