Workaround for Bug in LinkedInBot
LinkedInBot doesn't support content served as Content-Type: application/xhtml+xml
. This article shows how to workaround this LinkedIn bug in your Apache configuration.
- Author:
- Christian Hujer, Software Crafter and CEO / CTO of Nelkinda Software Craft Private Limited
- First Published:
- by Nelkinda Software Craft Private Limited
- Last Modified:
- by Christian Hujer
- Approximate reading time:
1 The Specification
The HTML5.2 specification clearly mentions two syntaxes for transmitting HTML resources [HTML5.2 Syntax].
- One is the HTML syntax, transmitted with the
text/html
MIME type. - The other is the XHTML syntax, transmitted with the
application/xhtml+xml
MIME type.
The XHTML syntax has a couple of benefits for me. For example, I can use XSLT, XInclude, and other XML technologies. With those, I have setup a document preparation system comparable to LaTeX. The consequence is that my website is one of those rare websites served as Content-Type: application/xhtml+xml
instead of Content-Type: text/html
. That shouldn't be a problem, though, as what I do is 100% compliant with the HTML 5.2 specification [HTML5.2 Syntax].
2 The Bug in LinkedInBot
LinkedInBot has a bug which is related to the Content-Type
. LinkedInBot seems to be unable to process content which is served as Content-Type: application/xhtml+xml
. When writing a post on LinkedIn, and that post includes a URL which points to a resource which is served as Content-Type: application/xhtml+xml
, then LinkedIn is unable to generate a preview. Instead, the user will see an error message that says ⊖ Cannot display preview. You can post as is, or try another link.
I have reported this bug to LinkedIn a couple of times over the past years. But the company behind LinkedIn — Microsoft — is known to give a shit about standards, correctness, and interoperability. So, if they ain't gonna fix LinkedInBot, I have to workaround in my web server instead.
The log entries in the log file look like this:
At least LinkedInBot is easy to identify in the log files via its unique User-Agent string LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com)
.
3 The Workaround for Apache httpd
The web server that I use is Apache. The following addition to the .htaccess
file provides a workaround for the issue in LinkedInBot.
With this workaround, the Apache web server will serve content to LinkedIn as Content-Type: text/html
instead of Content-Type: application/xhtml+xml
. That way, LinkedInBot can process the content. Note that not everything might work, though. LinkedInBot will use an HTML parser, not an XML parser. You will have the best success only if you do not rely on any specific features of an XML parser. So, do not use entities, XInclude, XSLT, or anything like that.
It's important that the rule is LinkedInBot
and not just LinkedIn
. Otherwise it would also match LinkedInApp
and break the display of XHTML on the LinkedIn App on Apple's iPhone.