How to Collect Twitter Data Using R and the Twitter Search API

Luckily, the hard work is done for us. There is a terrific R package called twitterR that allows you to easily connect to the Twitter Search API. You just need to know a few arguments to properly ask for the data you need.

First, let’s explore what type of data and limitations exist in the Twitter Search API so we know what we have to work with.

Official documentation:

“Returns a collection of relevant Tweets matching a specified query.

Please note that Twitter’s search service and, by extension, the Search API is not meant to be an exhaustive source of Tweets. Not all Tweets will be indexed or made available via the search interface.

To learn how to use Twitter Search effectively, please see the Standard search operators page for a list of available filter operators. Also, see the Working with Timelines page to learn best practices for navigating results by since_id and max_id.”

The first step, of course, is to activate the packages you need for this project. If you don’t have these packages installed already, you’ll need to do that too. I have all of these installed, so I’ve commented out that part here.

# install.packages("twitterR")

We’re getting close, but before we can request data from the Twitter API, we have to provide some credentials to make sure we aren’t doing anything nefarious. To accomplish this, you need four things:

  • consumer_key
  • consumer_secret
  • access_token
  • access_secret

No worries, all of these can be easily found here (you’ll need an active Twitter account): Once you’re logged, you need to create an “application” which is essentially just saying you want to work on a project. Go ahead and fill in the details and you should receive the four criteria above.

Now we’ll save each of these strings in this manner (note that you’ll need to replace your string where i have ‘abc123’):

onsumer_key <- 'acb123'
consumer_secret <- 'acb123'
access_token <- 'acb123'
access_secret <- 'acb123'

Now, let’s get authorized and begin requesting Twitter data.

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

Below is a simple request for Tweets that you can modify to your liking. In this example, I’m going to save my request as “nebtweets” and I’ll call for the information with “searchTwitter” which is part of the twitterR package we installed and activated. I’ve arbitrarily set the number of results I want back to 200, starting in February 2018. It’s important to note here that the Twitter Search API does NOT give you full access to Twitters’ data. It’s only an index of recent Tweets. So you may get back warnings if you try asking for something that is not available.

nebtweets <- searchTwitter("nebrasketball", n=200, lang="en", since = '2018-02-01')

Now we have the Tweets saved, but they're not in a nice, neat data frame. This can easily be solved using "twListToDF" which is also part of the TwitterR package.

nebtweetsDF = twListToDF(nebtweets)

Now you're ready to analyze. Enjoy.