Machine Learning and Artificial Intelligence for Content Publishers

Over the last few years, machine learning (often abbreviated as “ML”) and artificial intelligence (“AI”) have quickly evolved from science fiction fantasies to real solutions for challenging problems. Though they’re best known as the tech behind self-driving cars and facial recognition, AI and ML represent a powerful toolset that stands ready to tackle a whole host of opportunities for content authors and publishers—from big-name publishing brands to those just getting started.

Even more importantly, AI and ML are a chance for forward-thinking publishers of all sizes to get ahead of their competition. The technologies are still relatively new, and not widely adopted. They empower significant improvements in the editorial process, and in user engagement: adding a new avenue for growth, whether your content is your primary product or a secondary supporter for marketing and storytelling endeavors.

To best understand the opportunities that these technologies unlock, let’s begin by learning more about what they are, and how they work.

What are ML and AI?

In our science fiction fantasies, artificial intelligence is taken to the full breadth of the term: a computer-generated version of the human brain, capable of understanding, analyzing, learning, and acting, just like we are. Though we’re not there yet, the modern state of artificial intelligence has largely grown from the realization that computers are excellent at two fundamental tasks in which the human brain also excels: recognizing patterns and making educated guesses based on those patterns.

Computers, however, are capable of working with a far broader set of data than our brains are. We’re able to compare three or four variables across a few dozen examples to recognize a trend. Computers can tackle thousands of times that.

The art of modern artificial intelligence is in recognizing how problems can be decomposed into observations that computers are well-tuned to make, and building the approach, often called a “model”, that lets them do so most effectively. We can even take this a step farther, by building models that are able to retrospect on the effectiveness of their guesses and adjust their future predictions based on the accuracy of their prior ones—we call these models “self-learning.”

Artificial intelligence has been appearing across a number of industries over the last several years. Perhaps the most commonplace example is in real estate price prediction, popularized by Zillow, Redfin, and other sites. Fundamentally, real estate price comparison is driven by analyzing the similarity of properties across a number of facets and extrapolating a given property’s value based on how it stacks up. Historically, this was often done with manual analysis. This works well when we’re comparing across a small handful of variables, like home size or the number of bedrooms.

As we begin to consider many more variables, like lot size, features, location, time since last renovation, home age, and more, this becomes significantly more complicated. Zillow and Redfin realized that this type of data is tailor-made towards the strengths of computers, and have built models that are able to analyze this data and continuously adjust their conclusions as the market continues to evolve.

Machine learning is the technology that empowers artificial intelligence. Though I’ll spare you the details, you can think of it as the myriad approaches, built atop the field of statistical analysis, that enable us to develop models for artificially intelligent conclusions.

Though the most obvious use-cases for ML and AI are numerically-based, it’s not a requirement: there’s been a lot of research done over the last few years to develop efficient ways to turn images, speech, text, and other types of data into forms that ML and AI are able to effectively work with.

Applications

Now that we know what ML and AI are, let’s explore a few examples of the cool things that they can do for content publishers today. Though you’ll ultimately need to qualify the benefits of each within the context of your specific business, the vast majority of publishers I’ve encountered during my work at 10up are always striving to increase user engagement, content production productivity, and content efficacy: all things that ML and AI are particularly good at.

Content Insights

Most content publishing websites are divided into a handful of unique top-level categories, often with further subdivisions underneath. For sophisticated publishers, this schema is typically defined during an Information Architecture exercise during the website’s initial construction, leveraging insights from the publisher’s editorial team surrounding the content that they intend to write.

Over time, even the best-planned content architectures fall victim to natural shifts in the publishing environment and ambiguity in the true classification of content that sits “in between” categories. Using artificial intelligence solutions developed by IBM Watson, along with ready-made solutions like 10up’s ClassifAI plugin for WordPress, we’re able to holistically re-examine content to better understand what it’s truly about. Beyond providing insights when considering re-classifying content, this approach can be extended for automatically migrating content into a new content architecture.

There’s also an ancillary benefit: this technologies are very similar to how search engines read and understand your content. In reviewing the outcomes, you may surface new trends and patterns that are otherwise not directly evident.

Data-Driven Editorial

We can take the notion of classifying content one step farther, and aim to develop insights into why content performs the way that it does. What content does your readership most connect with? What topics are more likely to go viral? Are particular authors, particular publishing times, or specific channels more likely to lead to content that’s loved by your readership?

Though some analytics platforms, like Parse.ly, are beginning to think about the insights and decisions that can be drawn from this data, my mind imagines an editorial planning tool that illuminates the entire editorial process in this light. Some publishers, like Buzzfeed, have leveraged similar approaches for years with great success—and we’re now at a point where the technology is available for everyone to do it, too.

Highly-Engaging Recommendations

High-quality content recommendations are the singularly most effective tool for retaining readers and converting one-time users into dedicated audience members. Historically, content recommendations are driven by a combination of content categorization and popularity, building off the notion that a reader is most likely to be interested in content about similar topics that other readers have also enjoyed, too.

While this conclusion isn’t incorrect, it only scratches the surface of what’s possible. E-commerce companies like Amazon have built massive businesses over multi-facted recommendations, understanding that no two users are alike, nor are most user’s interests one-dimensional. Leveraging thoughtfully-crafted machine learning models, it’s now possible to build content recommendation engines that utilize reader’s historical behaviors, and in-depth understanding of the nature of content, to anticipate the content that a specific reader is most likely to engage with. When these recommendations are positioned well, particularly on mobile devices, they’re a sure-fire way to transform your content into a blackhole that sucks readers in.

We can even apply this same technology to other facets of our content platforms, like search—delivering recommendations that are specifically tailored to each unique user, backed by the knowledge acquired by analyzing the user-base’s behavior as a whole.

Computer-Generated Content

It’s also possible to leverage AI and ML to tackle the very heart of the content publishing process: generating content. Though it’s unlikely to ever replace thoughtful, nuanced journalism, AI has already been successfully utilized to capture rote content production, like post-round performance summaries from the PGA Tour. Over time, I expect this trend to continue to expand, with revenue generated by routine content used to fund insightful original reporting, much like Buzzfeed has begun to do.

Getting Started with AI and ML

The world of machine learning and artificial intelligence holds a great deal of promise, but it’s still-forming horizons can make it challenging to grasp the full breadth of its potential and differentiate time-wasting features from those that hold real promise.

Over the next few years, I expect a number of companies to emerge that provide productized solutions to more commonplace problems. These “features” that don’t require a fundamental reimagination of how your content business works are a great place to begin building familiarity and trust in the promise of AI.

When you’re ready to take a step farther, your best bet is finding an experienced technology strategist to help you unpack the core challenges specific to your content and develop AI-backed solutions that lean in to those challenges. Machine learning and artificial intelligence represent a complete reinvention of the challenges we can solve and the solutions we can develop, and there’s no harm in getting help along your journey.

Over the next few weeks, I’ll be exploring examples of artificially intelligent solutions for several other common challenges faced by content publishers. Be sure to sign up below to be notified as they’re available!

Quality is a Commodity


The key to selling is differentiation. You need to excel at somethingover your competitors. And, to be successful, that somethingneeds to be valuable to your customers. It’s one thing to excel through a collection of odds-and-ends features; it’s an entirely different thing to excel through that piece that makes your customer’s $100,000 business become a $1,000,000 business.

Over the years, I’ve had the fortune of meeting many talented entrepreneurs in various stages of their journeys: from initial ideation to firing on all cylinders. I try to ask each one a seemingly straightforward question: “What makes (whatever you’re selling) better than everyone else out there?”

Often, that answer is some variant of “We do the same thing, but we’re higher quality.” While that might be enough to get a business off the ground, it’s not enough to grow it in the long term. Over time, quality becomes a commodity: as your industry matures, your industry’s “hard problems” will become easier problems, and the ability to execute against them well will be commonplace.


The web hosting industry is a great example. Ten years ago, your options for hosting a website were very limited: inexpensive shared hosts, with poor quality and frequent downtime; or expensive dedicated servers (and, later, VPSes), with higher price points and skillset requirements.

Recognizing this gap in the market, a number of managed hosts — focusing on solving the hosting challenges of particular market verticals, and doing it well, began to pop up. My day job at 10up puts me in regular contact with one specific niche of this managed hosting marketed: content publishers that use WordPress.

WPEngine was an early entrant in that space, and was defined by the quality of its execution: cost effective, easy to use, and it worked. Over time, other managed hosts entered into the market, and they caught up. This market has become mature over the last 5 years, marked by at least a half dozen entrants with similar price points andquality.

And, unsurprisingly, it’s no longer possible to enter the WordPress managed hosting market and differentiate by quality — and those early players that relied solely on quality to differentiate have failed to thrive. Differentiation now exists in feature-sets and focus: developing functionality for more specific niches (enterprise publishers, developers, higher education, etc.).


There’s good news here: you can use this knowledge to improve your startup’s execution from day one. There’s a more efficient pathway than starting with an emphasis on quality and hoping you figure the rest out later.

From the start, keep your product vision as focused as can be — products that try to be everything to everyone become nothing for no one. Regularly test and refine your hypothesis in the marketplace — and, of course, execute well (quality might be a commodity, but people will still notice when it’s lacking).

This early focus has a number of tangible benefits. It’ll help you narrow down your feature roadmap, better understand your customer (the more specific your target persona, the better), target more effectively via digital marketing, and give you a framework for regular testing and iteration in the marketplace with a reasonably testable hypothesis (for example, it’s much easier to test iterations of a timeline tool for design project managersthan it is to test iterations of a timeline tool for everyone). While your competitors are looking for product/market fit and investing time in features that won’t ultimately matter, you’ll have the tools you need to focus your effort on entrenching yourself in your market.

Quality can’t help you stand out in the long term — but realizing that early can help you get a step ahead when it counts the most.

WordPress’ Gutenberg Won’t Change the World

The Internet has changed in the last few years, and content creation tools are starting to catch up.

WordPress was released upon the world over 15 years ago; two years later, it received the first version of its “visual editor”, powered by TinyMCE. This editing experience — effectively, providing a simplified version of a word processor in the web browser — quickly became the de facto standard for writing content across the web. It worked well enough, empowering us to italicizebold, underline, and embed our way around the web.

As the Internet evolved from a digital collection of word processed pages into significantly more complicated platforms, these visual editors have begun to show their age. True WYSIWYG (“what you see is what you get”) content creation experiences began to gain strength, and have evolved into the heated battle between Wix and Squarespace that any New York City subway rider is well familiar with. These tools enable the non-technical to create visually impressive — and, often, highly functional — websites with a degree of control and ease unmatched by traditional content editors like TinyMCE.

Since the rise of these platforms, the WordPress community has recognized the need for WordPress to compete with these kinds of tools. Technically simple, content-centric news sites are becoming less and less frequent, with “traditional” publishers like The New York Times beginning to regularly create design-curated and visually stunning experiences and infographics that far exceed the realms of bold-italic-and-font-size. While there’s been a number of admirable independent efforts to create a more flexible content creation experience in WordPress, none have taken off significantly. Automattic, the parent company of WordPress.com and the child of WordPress co-founder Matt Mullenweg, has made interesting waves over the years in its acquisitons of scroll kit (a tool for Snowfall-like web experiences) and Cloudup (ostensibly acquired for their Javascript development skills), nothing significant had emerged from the WordPress core project for several years.

In 2016, WordPress’ Gutenberg editor was announced. It is not a WYSIWYG content editor. It is not a Wix or Squarespace competitor. On its own, its a nice box with a very small gift inside.

It’s true that Gutenberg is a fundamental, well-executed, and long-overdo rethinking of the way content is written in WordPress. At its core, Gutenberg is born from the recognition that modern content is often best thought of as components — a construction of numerous discrete pieces assembled into one whole. As we consider the many different channels that content is now distributed across, including websites, mobile apps, and, increasingly, new channels like voice control (through platforms like Amazon’s Alexa), separating the layoutof the content from the content itself (and abstracting that layout into something more easily utilized than the combination of HTML and shortcodes that have lived on in WordPress for over a decade) is an essential step forward. From an authoring perspective, it’s also a major improvement in user experience — hiding HTML and shortcodes entirely from the vast majority of WordPress users will prevent newsroom accidents and make content significantly easier to manage.

What Gutenberg isn’t, however, is an attempt to tackle the elephant in the room in the content management system community: the marked change in how people are utilizing the web.

When WordPress was first released in 2003, and when it received the TinyMCE editor in 2005, content creation on the web was a playground for the technical. Over time, it expanded as a playground for content authors, with the expanded rise of the weblog (led by WordPress, along with platforms like Tumblr and Blogger). Since then, it’s become a tool for nearly everyone. There’s now an expectation that every business — small to large — has a website; an expectation that any person in a creative field (art, photography) has a website. If you don’t have a website, you may as well not exist.

WordPress’ Gutenberg editor does not improve the WordPress experience for those audiences. If they want a relatively simple, visually pleasing, and easy to manage website, they’ll go to Squarespace, Wix, or a similar tool. And, as the web continues to grow as a tool to empower anyone (not just content authors and writers!) to reach their audience, WordPress may well just fall behind.

There is a silver lining, however — in addition to being a fundamental rethinking of the way that content authors interact with WordPress, Gutenberg is also a substantial improvement for developers. WordPress’ TinyMCE editor was notoriously difficult to modify or improve as a developer; Gutenberg is engineered from scratch using now-common technical tools (react.js) and is designed to be significantly more accessible. For Gutenberg to help WordPress reach expanded audiences and really change the face of the web, its on the WordPress community to identify how to use it as a building block for doing so.

On its own, WordPress’ switch to Gutenberg is much like the shift from manual to automatic transmissions in cars: a simplification and improvement for a large portion of its audience, but not a catalyst for a major change in the audience itself.

For Gutenberg to become WordPress’ “self driving car” moment — with a significant expansion in audience as a result — it’s in the hands of the thousands of developers worldwide who build upon WordPress for their personal and business needs. And, with a strong business and enterprise community (10up has thrived serving high-end clients with solutions built primarily withWordPress), there’s significant upside to be found in doing so.

While a fuller feature set (potentially including front-end editing) is in the long-term plans for Gutenberg, those initiatives are currently in the conceptualization stages, and may take years to materialize; time WordPress may not have. Right now, WordPress is playing catch-up, while the rest of the web carries on ahead. If WordPress is going to take that “giant leap”, it’s likely to be at the hands of a subset of the community — an agency, a startup, or even a solo entrepreneur. Who’s going to lead the charge?

The Curious Conundrum of the new WordPress.com Business Plan


Automattic updated its WordPress.com Business planoffering today to support custom plugins and themes, further obfuscating the difference between WordPress.com and WordPress.org for the casual user and cannibalizing both the upper- and lower-end of the managed WordPress hosting space in a single bound. Though the WordPress managed hosting space is already quite saturated, with names like Dreamhost and Bluehost at the bottom end, Pressable and WPEngine in the middle, Page.ly and Pantheon as you continue, and WordPress.com VIP at the top end, the WordPress.com Business Plan’s pricing structure serves to undermine many of these offerings. At less than $25 per month, it’s quite affordable, too.


Some Background: The WordPress.com Wall

WordPress.com has long been the easiest way to create a new WordPress site, particularly for non-technical users. A site with a `.wordpress.com` web address has always been free, and an offering very much in line with the needs of the typical personal blogger or small business (no ads, custom domain, modest customization via CSS) has always been available for less than $9 per month.

Trouble has always arisen when a user wants to move beyond a “simple publishing website”. Previously, WordPress.com supported only the default functionality in WordPress, and features added by an Automattic offering called Jetpack, which were mostly administrative or related to social sharing. If you wanted to do anything more involved (from integrating with another comment system, like Disqus; to running a storefront; to designing a custom theme or using one outside of WordPress.com’s theme repository), you’d need to moved to one of the many managed WordPress hosts available. While that process isn’t difficult for a technically-oriented user or a developer, it’s often prohibitive for less-technical users. For self-serve business owners or personal “power users” on a budget, that’s a big deal.

That’s changing today. WordPress.com Business now supports uploading themes outside of WordPress.com’s theme repository, and uploading custom plugin code. While the offering doesn’t seem to support a handful of features that are common amongst WordPress managed hosts (a staging environment, SFTP access), I’ve been told that additional features will be rolled out over the coming weeks.


Dot Com vs. Dot Org

Clearly articulating the difference between WordPress.org (the WordPress software product) and WordPress.com (a service that provides hosted WordPress, or WordPress in a software-as-a-service model) has always been difficult, particularly to those who aren’t familiar with the concept of open-source software. For many years, the most effective explanation has originated in the differences in functionality. WordPress.org provided the software, WordPress.com enabled you to use the software as it was out of the box, and other hosting providers enabled you to use the WordPress software’s expansive ecosystem of plugins and themes.

With the introduction of the new WordPress.com Business Plan, it’s now wholly possible for a user to receive the entire “WordPress experience”, including its sometimes-notorious plugin- and theme-repository, without ever being aware of the difference. While some might argue that this is a good thing for the broader WordPress ecosystem, it risks obscuring — or even totally hiding — those who choose to contribute to the WordPress ecosystem out of the WordPress theme and plugin repositories, or companies within the WordPress space that ideologically different from Automattic. This, effectively, enables Automattic to launch a product competitive to any major premium WordPress plugin (Easy Digital Downloads, for example), market it using the “WordPress” name, and prioritize it over an independent offering to users whom may not know that alternatives exist. That’s bad.


Cannibalizing The Head and the Feet

With the exception of high-end enterprise clients, for whom uptime guarantees and security restrictions dominate platform decisions, selecting a managed WordPress host has long been a calculus of determine the least expensive host that can accommodate the number of page views one expects their website to receive. For editorially-focused sites, this can be a risky endeavor — one popular post on a channel like Facebook or reddit, while otherwise a sign of success, can force one to perform a costly hosting plan update at a moment’s notice.

WordPress.com’s Business Plan is unique in what it lacks: hard-and-fast restrictions on the number of page views a single site can receive. This means that, in theory, a popular-yet-technically-simple website with a large hosting bill elsewhere (due to traffic levels) could migrate to the WordPress.com Business Plan, pay $25 per month, and potentially save _thousands_ of dollars. WordPress.com’s economy of scale (it serves hundreds of millions of impressions per day) nearly guarantees that no other managed host can afford to expose themselves to a similar structure at that price point.

While those already on mid-marketing hosting providers are unlikely to find an immediate need to switch, this has interesting implications in two spaces: the top-end, premium portion of the market, and the lower-end “free” consumer portion.

While $25 per month is not a small investment, it’s not unreasonable to ask of someone looking to generate revenue from their website, either through advertising or through sales of a service related to the website. The additional flexibility and “growth-proofing” of the WordPress.com Business Plan means that small website owners with large aspirations will be tempted to begin their digital journey on WordPress.com, upgrading to the Business Plan when needed, or simply beginning — and ending — there.

Of even greater interest is the top-end of the market: a premium space occupied partially by high-volume publishers and corporations with enterprise-geared service requirements. At present, these folks are often forced to select an enterprise-focused managed hosting plan that can come close to — or exceed — $1,000 per month. While WordPress.com doesn’t currently publish an uptime guarantee (part of a larger document known as an SLA), I expect that, upon private discussion, a non-trivial number of users currently utilizing premium-level WordPress hosts, including WordPress.com’s own WordPress.com VIP, will find the service history of WordPress.com (which is quite good) comforting enough to deploy to the new WordPress.com Business Plan.


Fairness in the Marketplace

Long-standing criticism within the WordPress community stems from WordPress.com’s use of the WordPress trademark. Automattic is quite aggressive in its defense of the WordPress trademark, which is actually owned by an “independent” nonprofit, The WordPress Foundation, and perpetually licensed, royalty free, for use on WordPress.com. In truth, both the WordPress Foundation and Automattic (creator of WordPress.com) are controlled by Matt Mullenweg, a co-founder of WordPress and the project’s “benevolent dictator”. Some have argued that this relationship enables Automattic (and thus Matt) to unfairly profit off of the popularity of the WordPress software and name.

While this topic has come up occasionally over the years, it often simmers down quickly as Automattic/WordPress.com previously occupied only the very low end of the market (through its free- and low-cost offerings on WordPress.com) and very high end of the market (through WordPress.com VIP). Now that Automattic has clearly taken aim at the broader portion of the market, some will find new issue in this arrangement.

This gets particularly complicated when considering Automattic’s (and WordPress’) relationship with existing third-party hosts like Bluehost and GoDaddy. WordPress.org currently lists a number of preferred hosting partners, and Automattic has worked with a variety of WordPress hosts, including Bluehost and GoDaddy, historically. Automattic had previously complicated this arrangement with its majority-acquisition of Pressable, a mid-market WordPress managed host, though it’s now become clear that Automattic very much sees itself as competing across the broader WordPress market.


A Good Thing for WordPress?

Within the WordPress community, there’s long been a notion that “more users on WordPress” is universally good. Until now, that’s been difficult to argue: an expansive ecosystem has developed over the last decade, and many now make their living off of WordPress.

Despite that, WordPress.com’s Business Plan now feels like it’s oriented towards cannibalizing users from elsewhere within that ecosystem — from sites that may have “grown up” and moved to another hosting provider to those that now may not know that the broader ecosystem even exists — which is objectively a step backwards for the WordPress community. While Automattic appears to have launched a solid product into the market at a competitive price-point, it owes it to the community (particularly in light of the Dot Com vs. Dot Org debate) to ensure that it continues its commitment to properly educating users on the breadth of the WordPress community. outside of the Automattic ecosystem.