Digital Marketing in R: How to Create Word Clouds

I recently recorded my very first (much too lengthy) YouTube video. The video walks through taking a list of keywords and creating a word cloud in R.

While I do not find word clouds to be particularly useful, there are a number of terrific data science applications that you come across during this exercise that are worth knowing — like removing stop words and stemming.

Un-edited thoughts on Topical Keyword Research and Intent Based SEO

I have been thinking a lot lately about topical keyword research and how this plays a role in SEO, content hierarchy and the data science approaches we use to accomplish these ideas. Let me back up…

Topical Keyword Research and Intent-based Search
Whether you’re an SEO or just someone who’s observed Google search results over time, it’s clear that over the years Google SERPs have become much more “semantic.” But what does that really mean?

In short, computers have used natural language processing (NLP) to better understand how human language works. That might be understanding synonyms, crafting results based on which device type you are using or any number of things. But the bottom line is that Google has moved away from showing results that are heavily keyword-based (returning pages that contain the exact phrase you typed or something very similar) to more of a semantic or intent-based approach where the results might contain the keyword you searched, but they are more concerned with showing results you intend to see and understanding if something related is a better result and does not contain your exact keywords — that’s okay.

How Does this Impact SEO?
In a big way. And this is not new, but we need to approach keyword research and craft content around intent and not specific 1:1 keywords. In other words, we should get a list of keywords, cluster them into intent groups and then build content based on intent groups instead of individual keywords. This is ultimately what the user wants — not a a bunch of slight variations of content that are more or less similar. And Google theoretically will rank this content well if it meets user intent and they can connect it to the query.

How is this Related to Data Science?
For one, clustering is a big topic in data science and can be executed in R. There are no doubt SEO tools out there that exist, but if you want more control you might consider supervised or unsupervised clustering in R.

Final thoughts
I can see a bigger picture here as well. As we craft our content based on intent and clustering, we can almost take a testing approach to content and site information architecture in the future. Basically, one could build out their intent groups and with that list merge content that is part of one cluster or fill out any gaps. Over time analytics should show how users move through the funnel and if any steps are needed to provide an easy path for users (a path to whatever your goal happens to be).

But I think there is a paid media tie in here as well. Not often enough do we look at paid media performance from an SEO perspective and document which keywords drive conversions versus which are more informational. We should be using that information to learn how to build out information architecture as well. It should be an additional layer to better understand how to break up similar content throughout the user journey and confirm which keywords belong to which bucket.

How Even Were Whistles in the 2017 NBA Playoffs?

TEAMPERSONAL FOULS/GAMEVARIANCE
AVERAGE210
Washington Wizards232
Indiana Pacers232
Oklahoma City Thunder221
Memphis Grizzlies221
Golden State Warriors221
Portland Trail Blazers221
Atlanta Hawks210
Milwaukee Bucks210
Utah Jazz210
Houston Rockets210
Toronto Raptors210
Boston Celtics210
Los Angeles Clippers20-1
Cleveland Cavalier19-2
San Antonio Spurs19-2
Chicago Bulls18-3

A Word on Digital Marketing

I have spent over a decade working in the digital marketing space. It’s an area I know well but also has much cross over with data science. In fact, mostly everything I learn in data science is usually applied to one of these two interest areas of mine: digital marketing or sports.

With that said, I’ll be posting digital marketing ideas and experiences from time to time. These posts may not always tie back to data or analytics, but I’ll try my best to connect the two when possible.

Data Science Course Recommendation: Udemy Data Science A-Z

I want to give some props to a course I recently took online at Udemy.com. The course is called Data Science A-Z and is taught by someone by the name of Kirill Eremenko.

First, I just want to stress that I am not being paid for this endorsement in any way. Just want to share my review with you all.

The price was right at a mere $10. Not sure if that was a short-term promotional price or how long it will last, but it’s well worth it — even as a refresher.

There are three sections: data visualization with Tableau, Statistics/Modeling, and Data Preparation. The sections are not dependent on each other and can be taken in any order, which adds a nice element of flexibility to the whole thing.

As you probably know, there are countless courses out there but what I appreciate about this one is that it was easy to digest if you have any sort of background in these areas and it explains not only how to approach these disciplines but why you are doing them at all.

During the course, I was also introduced to a great free statistical program called Gretl. You can download it here. If you have used SPSS or SAS, you’ll pick it up in no time at all.

Find out more here: https://www.udemy.com/datascience/

I also really like Data Camp, but there is a monthly fee associated with membership. I believe it’s somewhere between $20-30/month.

Thanks for reading.

3 Data/Analytics Podcast Recommendations

Here is a brief list of podcasts I would recommend that pertain to either digital marketing or data science. Enjoy!

The Digital Analytics Power Hour

Hosted by Tim Wilson and Michael Helbling, this podcast focuses on a number of digital analytics topics including anything from R to what the future digital marketing analyst will look like from a skills perspective.

The Data Skeptic

I just started listening to this one and I love it. Many of the episodes are very short (about 15 minutes), so it’s very digestible. There’s a wide range of very relevant topics from a refresher on p-values and t-tests to neuroscience. I really like how they episodes only last as long as they need to be and how they break down seemingly complex topics into something everyone can grasp.

FiveThirtyEight

This one is less about understanding data/analytics and more about findings the team over at 538 has made. If you’re reading this, you most likely are already familiar with the 538 blog, where topics are generally focused on politics and sports.

Any good recommendations out there I missed? Let me know in the comments. Thanks!

Using R with data sets from data.world

Recently I found out about a wonderful website, data.world, which is kind of like a social/collaboration site for data sets. I highly recommend checking it out. If nothing else, it has numerous data sets for you to learn and build from.

I found a data set that contains NCAA March Madness results dating back to 1985. One of the things that I really like about data.world are its built in features. For one, you can explore data sets right within the website and run SQL queries to return views of the data that are of interest to you.

If you are not familiar with SQL, it is worth exploring, but I won’t go into it here. Instead, I’ll show you the simple queries I made to return appearances made in the tournament by Creighton and Nebraska:

SELECT * FROM `Big_Dance_CSV` where Big_Dance_CSV.Team="Creighton" or Big_Dance_CSV.`Team(2)`="Creighton"
SELECT * FROM `Big_Dance_CSV` where Big_Dance_CSV.Team="Nebraska" or Big_Dance_CSV.`Team(2)`="Nebraska"

For these queries to really make sense, you need to be familiar with the columns that exist in the data set. With this particular data set, there are columns for Home and Away teams (Team and Team(2)) so I asked for any results where one of the team was Creighton or Nebraska.

Another feature that I absolutely love about data.world is how is easy it is to take the data and place it into R Studio. By selecting Export > Copy R Code, you add the R code necessary to create a data frame in R of the SQL query you created. So simple. Here is what it gave me for my Creighton query:

df <- read.csv("https://query.data.world/s/dnhmq1rfdbdw18tg7jkfl0dmt",header=T);

That created this data frame in R for me to work with:

Year

Round

Region

Seed

Score

Team

Team.2.

Score.2.

Seed.2.

1

2001

1

3

7

69

Iowa

Creighton

56

10

2

2002

1

4

5

82

Florida

Creighton

83

12

3

2002

2

4

4

72

Illinois

Creighton

60

12

4

2003

1

2

6

73

Creighton

Central Michigan

79

11

5

2005

1

2

7

63

West Virginia

Creighton

61

10

6

2007

1

4

7

77

Nevada

Creighton

71

10

7

2012

1

4

8

58

Creighton

Alabama

57

9

8

2012

2

4

1

87

North Carolina

Creighton

73

8

9

2013

1

1

7

67

Creighton

Cincinnati

63

10

10

2013

2

1

2

66

Duke

Creighton

50

7

11

2014

1

3

3

76

Creighton

Louisiana Lafayette

66

14

12

2014

2

3

3

55

Creighton

Baylor

85

6

13

1989

1

1

3

85

Missouri

Creighton

69

14

14

1991

1

1

6

56

New Mexico St

Creighton

64

11

15

1991

2

1

3

81

Seton Hall

Creighton

69

11

16

1999

1

3

7

58

Louisville

Creighton

62

10

17

1999

2

3

2

75

Maryland

Creighton

62

10

18

2000

1

2

7

72

Auburn

Creighton

69

10

From there, I created this pretty simple bar graph with ggplot that displays when the Jays appeared in the tournament and what round they made it to. All in all it took me well under an hour.

And for the Huskers as well:

Hope this example shows how easy it is to take data.world data and create something in R. You could, of course, pull the entire data set into R as well to do data analysis, build models, etc. but this is a good start.

The Pros and Cons of Learning R for Digital Marketers

For me, it is worth the time I have spent (and will continue spending) to learn the R programming language. I work in the digital marketing space and while I do not believe it is necessary for everyone to learn R, I would recommend giving it a go if you are already interested.

Here are some of the pros and cons as I see them:

PROS

  • It’s difficult to deal with very large data sets in Excel, so R is a language and environment where you can analyze large data sets in a relatively fast and powerful way
  • There’s so much you can do — from connecting to APIs to statistical analysis to forecasting to word clouds to modeling to clustering to creating interesting visuals (I could go on)
  • It’s not THAT difficult to learn and there is a tremendous community of people just like you and me who are contributing daily so that we can more or less copy and paste their work in order to apply it to our data

CONS

  • It does take some time and dedication to learn what you need to know in R
  • Excel is pretty great. It works well for most things like reporting and data analysis. It’s only when we are talking about using extremely large data sets or doing analysis outside of Excel’s capabilities where R is necessarily needed.
  • Even if you figure out how to use R, you should still practice responsibility when it comes to forecasting, regression, etc. In other words, you really should learn the nuances of those disciplines as well in order to make sure your analysis is accurate.

Anything I am missing? Please feel free to leave your thoughts below and continue the conversation!

Some Basic Thoughts on Data Visualization

Data visualization is the art and science of clearly communicating data in a way that is easily digestible to the end user. It goes without saying that there is so much data available now. But effectively making sense of that data is critically important. That’s when data really becomes useful information.

Excel can do some things. You are all likely familiar with their built-in line graphs, pie charts and other visualizations. But it is limited in the amount of data you can work with and the customization of visuals. For many scenarios it just fine.

But there are times when one might have more data to work with or perhaps does not have a great way to get data to Excel in the first place. There are tools out there like Tableau that can connect to APIs (or you can import the data) and have a rich library of visualizations and ways to customize them. Tableau in particular has a free “Public” version available however the output will be placed on their website for anyone to see. If you are a business or agency that wants to keep your data private, a paid version is out there as well.

Recently Google announced their version of Tableau — Data Studio. There is a free version here as well and I think you can choose to keep your data private. However, at least as of June 2016, it only connects to other Google platforms. If you want to pull other data into the user interface, it is possible, but you first need to get it into Google Sheets or Big Query, which is a Google SQL database more or less. Big Query is its own topic for another day.

I have only scratched the surface on what is out there. For R users, you can download and work with libraries like ggplot2; QlikView is another Tableau-esque tool; and on and on. Check out this website for a few more examples, which are cleverly broken out tools for developers and non-developers: http://thenextweb.com/dd/2015/04/21/the-14-best-data-visualization-tools/#gref.

How to Pull API Data Into Excel

Pulling API data into Excel is quite a bit easier than what I expected. The most difficult part is understanding how to build the URL you will use to request the data.

In my case, I have been working with the Sportradar API to analyze Husker football data. The first step for me was to get an API key that allows me to get back data from Sportsradar.  Once I had that, it was simply a matter of taking these simple steps:

  1. Open a new Excel workbook
  2. Click on the Data tab in Excel
  3. Click From the Web
  4. Enter the API URL

It’s as easy as that. Unless you have come across an error you should have your data tables listed in Excel.

If you are interested in using the Sportsradar API, checkout http://developer.sportradar.us/