We talk a lot about data.

We know it’s important, but not everyone knows the ins and outs of getting the most out of your data and data collection process(es).

So, the Spin Sucks team asked me if I could outline what Mike Connell calls, “The Dummies Guide to Data.”

I’ll be a little more charitable, and call it the data lifecycle.

The Data Lifecycle for Communicators: Part 1

This is the first in a series of three posts (and videos!) discussing how you can make better use of data, how to think about managing data, how to think about processing it, and the cycle you should go through with your data to make it more useful.

Our first installment will focus on “Preparation,” which covers the “Define to Augment” stages of the data lifecycle.

The second, entitled “Exploration,” will address the “Explore to Predict” stages.

And the third, “Production,” covers “Prescribe to Observe.”

An Introduction to Data

Data is the foundation of great business, of analytics, of developing good insights that move the needle for your business. And it’s going to become increasingly important as more of the world turns to machine learning and artificial intelligence.

Brief aside… Artificial intelligence and machine learning are all about taking data and turning it into software.

Traditionally, when you fire up, say, your word processor, you have an app that’s software, and then you use the software to generate data, like your Word documents and things.

Artificial intelligence is the opposite, where you take data and give it to machines that build algorithms. And that, in turn, becomes the software.

They build models. So if your data isn’t great, guess what, of course, you’re going to get terrible results, terrible models, and things will just not work out well for you.

So, having that bedrock of solid data is so important to make data usable today, and set data as the foundation for your future.

Now let’s talk about this.

Data Processing Lifecycle

The data processing lifecycle outlines how you take data and turn it into something useful.

It is based, unsurprisingly, on the scientific method.

And data science is nothing more than using the scientific method with your data.

Data Lifecycle: Define

The first thing you need to do with your data before you do anything else? Define what problem you’re trying to solve.

  • What are the goals?
  • The reasons you’re doing this investigation?
  • What question do you want to answer?
  • What’s the business impact you need to make?

If you don’t have those questions answered, if you don’t define your goals, the data is not going to help, it’s just going to take up time and resources you could be using for other things.

So that’s the first thing, and it is probably the most important thing. Everything else is sort of execution.

But defining what you’re going to do with the data is the strategy part. You’ve got to get that right.

Data Lifecycle: Ingest

The next step is to ingest your data. And that means to essentially bring it. To gather it.

As communications professionals, you have a lot of data sources. You have social media monitoring tools and regular media monitoring tools.

You’ve got web analytics, CRM, marketing, and automation data.

And you have traditional coverage, maybe even closed caption data.

So you’ve got to bring all this data into an ecosystem somehow.

Typically, for data scientists, you’re going to be working with people who can deal in structured and unstructured databases.

Structured data is anything that goes in Excel.

Unstructured data is everything else on your computer, like Word, iTunes, and all the stuff that doesn’t fit in nice, neat tables.

You’ll want to differentiate that. Ingest it. Bring it all into one spot, because the next thing you’re going to do with that data is analyze it.

Data Lifecycle: Analysis

This is not analysis in terms of, “Okay, what answers does it give us?”

This is determining the condition of the data.

Is it in good condition? Is it in bad condition? Are there issues with it? What’s going on within the data?

And there’s really three things that that can go wrong with your data.

There can be data that is wrong. So you get data out of your social media monitoring tool, and some of it is flat out wrong.

You can get data that is missing data which is just not there, or there are gaps in your data. Maybe somebody removed the tags from your website by accident. Now you have missing data.

And the third thing is you can get data that is structured incorrectly, making it difficult to analyze.

I was looking at some podcasting stats recently from a podcasting application. And they gave data in this crazy table. Tables on top of tables. You have to undo the mess they’ve made.

This is why analysis is so important because you don’t know what mess(es) you’ll have to fix.

Data Lifecycle: Repair, Clean, and Prepare

The next three steps in the lifecycle are:

  • Repair
  • Clean
  • Prepare

Of course, repairing is dealing with those issues of missing data.

Cleaning is fixing the data that’s wrong.

Preparing the data is structuring it in such a way to make it work for you.

Healthcare data, for example, is such a mess. Particularly, electronic health and medical records.

I was looking at an EMR system recently, and it had 70 different tables of data.

In order for many of us to really do data analysis, preparing the data means trying to get it out of these 70 different tables, and bringing it back into one big table so we can run analysis on it.

Repair, clean, and prepare are important steps in data processing.

For the average communications professional, it goes back to a lot of those services: You have your social media, your media monitoring, some marketing data, and some web analytics data.

You’ll need to bring all that together, then repair, clean, and prepare it so you can do something good with it.

Data Lifecycle: Augment

The next step in the cycle is called augment.

Okay, you’ve got all this data, but is the data you have enough? Or, are there other data sources you need to bring in?

For example, if you’re doing social media analytics. You’ve got your brand’s data or your client’s data for everything they did on Twitter. Your client may ask you: “Hey, what about our competitors?” So now you need to augment the data you’ve gathered with outside data sources. You need to bring in additional stuff.

Another really good example: You have media monitoring for you and your peers. But you see a big dip in the data that you want to explain. What happened?

You may need to augment that with data from Google News.

For example, what was happening in the world at the time that could have taken away mindshare and eyeballs from your client, or from your program?

So augmentation is a really important step to bring in third-party data sources you need.

Data Analysis is A Lot of Work

So that’s the first third of the data lifecycle.

Most data scientists spend the bulk of their time—60 to 80 percent—on preparing data.

There are a number of software companies trying to make it easier—usually for very high prices—but in the end, there’s always going to be a little bit of elbow grease involved.

Again, just to set expectations, preparing the data alone takes a lot of time.

You can automate some of it, and you absolutely should automate some of it. But some is still going to require elbow grease in the end.

I would strongly encourage you—especially if you work in an agency—to look at automation solutions, custom code, and working with a developer.

The last thing you want to do is take an account coordinator and turn them into a copy-and-paste person. That is a waste of a billable resource.

Your bill rates for even the lowest paid person on staff are going to be $50, $75, $100, $125 an hour.

Do you really want to spend $100 an hour of client money on copying and pasting data? Probably not.

So look for solutions to reduce the amount of time you spend bringing all your data together.

Up Next: Exploration

Phew. We’ve covered the “Preparation” stage of the data lifecycle.

Our next installment will discuss the exploratory stage, where we figure out what we have.

Note: We have also packaged this “Data Lifecycle for Communicators” into a series of webinars.

The Data Lifecycle for Communicators: Preparation

I look forward to addressing any questions you may have. Please feel free to ask below in the comments, or join the Spin Sucks Community and connect with me there (@Christopher S. Penn).

If you’d like assistance with your company’s data and analytics, visit Trust Insights and let us know how we can help you.

Photo by Markus Spiske on Unsplash (Chris Penn image 1)

(Chris Penn Image 2) Photo by rawpixel on Unsplash

Christopher Penn

Christopher S. Penn is an authority on analytics, digital marketing, and marketing technology. A recognized thought leader, best-selling author, and keynote speaker, he has shaped four key fields in the marketing industry: Google Analytics adoption, data-driven marketing, modern email marketing, and artificial intelligence/machine learning in marketing. As Chief Innovator of Trust Insights, he is responsible for the creation of products and services, creation and maintenance of all code and intellectual property, technology and marketing strategy, brand awareness, and research & development.

View all posts by Christopher Penn