Collecting College Football Data through Sportradar API using R

In order to kick off a personal college football rating project with R, I knew I needed team data and game by game data for the 2018 college football season for all 130 teams. I was able to obtain this data through the Sportradar API.

They were gracious enough to provide me with access to the API for 30 days, although access usually requires a fee, especially if you are monetizing your project. I won’t go through all of the steps of obtaining access to their API here. But once you have proper access, this will show you how to call and transform the API data into a workable data frame for analysis.

Here are my API calls using the httr and jsonlite packages:

## ASSUMING THESE ARE ALREADY INSTALLED
library(httr)
library(jsonlite)
options(stringsAsFactors = FALSE)

## STORE YOUR SPORT RADAR API INFORMATION
sruser <- "YOURUSERNAME"
srid <- "YOURUSERID"
srsecret <- "YOURUSERSECRET"
srtoken <- "YOURTOKEN"
srappname <- "spacialsand"
srurl <- "https://api.sportradar.us"
srpath <- "/ncaafb-t1/2018/REG/schedule.json?api_key=APIKEYHERE"
srteams <- "/ncaafb-t1/teams/FBS/2018/REG/standings.json?api_key=APIKEYHERE"

Collecting Team Data

Once you have your API access information stored (above) you can start making API calls from R with GET, like this:

## API CALL FOR TEAM DATA
srteams.raw.result <- GET(url = srurl, path = srteams)
srteams.raw.content <- rawToChar(srteams.raw.result$content)
srteams.content <- fromJSON(srteams.raw.content)

## PULL TEAM DATA BY CONFERENCE OUT OF LISTS
cfb_team1 <- srteams.content$division$conferences$teams[[1]]
cfb_team2 <- srteams.content$division$conferences$teams[[2]]
cfb_team3 <- srteams.content$division$conferences$teams[[3]]
cfb_team4 <- srteams.content$division$conferences$teams[[4]]
cfb_team5 <- srteams.content$division$conferences$teams[[5]]
cfb_team6 <- srteams.content$division$conferences$teams[[6]]
cfb_team7 <- srteams.content$division$conferences$teams[[7]]
cfb_team8 <- srteams.content$division$conferences$teams[[8]]
cfb_team9 <- srteams.content$division$conferences$teams[[9]]
cfb_team10 <- srteams.content$division$conferences$teams[[10]]
cfb_team11 <- srteams.content$division$conferences$teams[[11]]

## SOME TEAMS DO NOT HAVE SUBDIVISIONS BUT WE NEED EQUAL COLUMNS
cfb_team3$subdivision <- NA
cfb_team6$subdivision <- NA

Quick note on what is occurring in the above code chunks…when you first retrieve data from the Sportradar API, it will return raw data that is not easy to work with. So we are basically taking the raw data and keeping only the information we need, then transforming that from JSON format to more workable tables in R.

Important note: In the second-to-last step, I create data frames for each conference because we get to a point where we end up with lists and need a way to pluck out the separated data and eventually combine it into one data frame. I am positive there is a more efficient way to tackle this, perhaps looping through the lists.

This is how I was able to make it work, but suggest you consider alternative ways in order to keep your R code efficient. And it’s great practice!

At this point, we end up with a number of data frames within data frames, which is problematic during analysis. To deal with it, I took a very (embarrassingly) manual approach to this, which again should be done in a more efficient way. If you have better suggestions, please let me know in the comments. But until I revisit it at another time, here is a long way to handle it, pulling out the variables that I care to keep:

cfb_team1$overall.wins <- cfb_team1$overall$wins
cfb_team1$overall.losses <- cfb_team1$overall$losses
cfb_team1$conference.wins <- cfb_team1$in_conference$wins
cfb_team1$conference.losses <- cfb_team1$in_conference$losses
cfb_team1$home.wins <- cfb_team1$home$wins
cfb_team1$home.losses <- cfb_team1$home$losses
cfb_team1$away.wins <- cfb_team1$away$wins
cfb_team1$away.losses <- cfb_team1$away$losses
cfb_team1$decided_by_7.wins <- cfb_team1$decided_by_7_points$wins
cfb_team1$decided_by_7.losses <- cfb_team1$decided_by_7_points$losses
cfb_team1$last_5.wins <- cfb_team1$last_5$wins
cfb_team1$last_5.losses <- cfb_team1$last_5$losses
cfb_team1$points.against <- cfb_team1$points$against
cfb_team1$points.net <- cfb_team1$points$net

cfb_team2$overall.wins <- cfb_team2$overall$wins
cfb_team2$overall.losses <- cfb_team2$overall$losses
cfb_team2$conference.wins <- cfb_team2$in_conference$wins
cfb_team2$conference.losses <- cfb_team2$in_conference$losses
cfb_team2$home.wins <- cfb_team2$home$wins
cfb_team2$home.losses <- cfb_team2$home$losses
cfb_team2$away.wins <- cfb_team2$away$wins
cfb_team2$away.losses <- cfb_team2$away$losses
cfb_team2$decided_by_7.wins <- cfb_team2$decided_by_7_points$wins
cfb_team2$decided_by_7.losses <- cfb_team2$decided_by_7_points$losses
cfb_team2$last_5.wins <- cfb_team2$last_5$wins
cfb_team2$last_5.losses <- cfb_team2$last_5$losses
cfb_team2$points.against <- cfb_team2$points$against
cfb_team2$points.net <- cfb_team2$points$net

cfb_team3$overall.wins <- cfb_team3$overall$wins
cfb_team3$overall.losses <- cfb_team3$overall$losses
cfb_team3$conference.wins <- cfb_team3$in_conference$wins
cfb_team3$conference.losses <- cfb_team3$in_conference$losses
cfb_team3$home.wins <- cfb_team3$home$wins
cfb_team3$home.losses <- cfb_team3$home$losses
cfb_team3$away.wins <- cfb_team3$away$wins
cfb_team3$away.losses <- cfb_team3$away$losses
cfb_team3$decided_by_7.wins <- cfb_team3$decided_by_7_points$wins
cfb_team3$decided_by_7.losses <- cfb_team3$decided_by_7_points$losses
cfb_team3$last_5.wins <- cfb_team3$last_5$wins
cfb_team3$last_5.losses <- cfb_team3$last_5$losses
cfb_team3$points.against <- cfb_team3$points$against
cfb_team3$points.net <- cfb_team3$points$net

cfb_team4$overall.wins <- cfb_team4$overall$wins
cfb_team4$overall.losses <- cfb_team4$overall$losses
cfb_team4$conference.wins <- cfb_team4$in_conference$wins
cfb_team4$conference.losses <- cfb_team4$in_conference$losses
cfb_team4$home.wins <- cfb_team4$home$wins
cfb_team4$home.losses <- cfb_team4$home$losses
cfb_team4$away.wins <- cfb_team4$away$wins
cfb_team4$away.losses <- cfb_team4$away$losses
cfb_team4$decided_by_7.wins <- cfb_team4$decided_by_7_points$wins
cfb_team4$decided_by_7.losses <- cfb_team4$decided_by_7_points$losses
cfb_team4$last_5.wins <- cfb_team4$last_5$wins
cfb_team4$last_5.losses <- cfb_team4$last_5$losses
cfb_team4$points.against <- cfb_team4$points$against
cfb_team4$points.net <- cfb_team4$points$net

cfb_team5$overall.wins <- cfb_team5$overall$wins
cfb_team5$overall.losses <- cfb_team5$overall$losses
cfb_team5$conference.wins <- cfb_team5$in_conference$wins
cfb_team5$conference.losses <- cfb_team5$in_conference$losses
cfb_team5$home.wins <- cfb_team5$home$wins
cfb_team5$home.losses <- cfb_team5$home$losses
cfb_team5$away.wins <- cfb_team5$away$wins
cfb_team5$away.losses <- cfb_team5$away$losses
cfb_team5$decided_by_7.wins <- cfb_team5$decided_by_7_points$wins
cfb_team5$decided_by_7.losses <- cfb_team5$decided_by_7_points$losses
cfb_team5$last_5.wins <- cfb_team5$last_5$wins
cfb_team5$last_5.losses <- cfb_team5$last_5$losses
cfb_team5$points.against <- cfb_team5$points$against
cfb_team5$points.net <- cfb_team5$points$net

cfb_team6$overall.wins <- cfb_team6$overall$wins
cfb_team6$overall.losses <- cfb_team6$overall$losses
cfb_team6$conference.wins <- cfb_team6$in_conference$wins
cfb_team6$conference.losses <- cfb_team6$in_conference$losses
cfb_team6$home.wins <- cfb_team6$home$wins
cfb_team6$home.losses <- cfb_team6$home$losses
cfb_team6$away.wins <- cfb_team6$away$wins
cfb_team6$away.losses <- cfb_team6$away$losses
cfb_team6$decided_by_7.wins <- cfb_team6$decided_by_7_points$wins
cfb_team6$decided_by_7.losses <- cfb_team6$decided_by_7_points$losses
cfb_team6$last_5.wins <- cfb_team6$last_5$wins
cfb_team6$last_5.losses <- cfb_team6$last_5$losses
cfb_team6$points.against <- cfb_team6$points$against
cfb_team6$points.net <- cfb_team6$points$net

cfb_team7$overall.wins <- cfb_team7$overall$wins
cfb_team7$overall.losses <- cfb_team7$overall$losses
cfb_team7$conference.wins <- cfb_team7$in_conference$wins
cfb_team7$conference.losses <- cfb_team7$in_conference$losses
cfb_team7$home.wins <- cfb_team7$home$wins
cfb_team7$home.losses <- cfb_team7$home$losses
cfb_team7$away.wins <- cfb_team7$away$wins
cfb_team7$away.losses <- cfb_team7$away$losses
cfb_team7$decided_by_7.wins <- cfb_team7$decided_by_7_points$wins
cfb_team7$decided_by_7.losses <- cfb_team7$decided_by_7_points$losses
cfb_team7$last_5.wins <- cfb_team7$last_5$wins
cfb_team7$last_5.losses <- cfb_team7$last_5$losses
cfb_team7$points.against <- cfb_team7$points$against
cfb_team7$points.net <- cfb_team7$points$net

cfb_team8$overall.wins <- cfb_team8$overall$wins
cfb_team8$overall.losses <- cfb_team8$overall$losses
cfb_team8$conference.wins <- cfb_team8$in_conference$wins
cfb_team8$conference.losses <- cfb_team8$in_conference$losses
cfb_team8$home.wins <- cfb_team8$home$wins
cfb_team8$home.losses <- cfb_team8$home$losses
cfb_team8$away.wins <- cfb_team8$away$wins
cfb_team8$away.losses <- cfb_team8$away$losses
cfb_team8$decided_by_7.wins <- cfb_team8$decided_by_7_points$wins
cfb_team8$decided_by_7.losses <- cfb_team8$decided_by_7_points$losses
cfb_team8$last_5.wins <- cfb_team8$last_5$wins
cfb_team8$last_5.losses <- cfb_team8$last_5$losses
cfb_team8$points.against <- cfb_team8$points$against
cfb_team8$points.net <- cfb_team8$points$net

cfb_team9$overall.wins <- cfb_team9$overall$wins
cfb_team9$overall.losses <- cfb_team9$overall$losses
cfb_team9$conference.wins <- cfb_team9$in_conference$wins
cfb_team9$conference.losses <- cfb_team9$in_conference$losses
cfb_team9$home.wins <- cfb_team9$home$wins
cfb_team9$home.losses <- cfb_team9$home$losses
cfb_team9$away.wins <- cfb_team9$away$wins
cfb_team9$away.losses <- cfb_team9$away$losses
cfb_team9$decided_by_7.wins <- cfb_team9$decided_by_7_points$wins
cfb_team9$decided_by_7.losses <- cfb_team9$decided_by_7_points$losses
cfb_team9$last_5.wins <- cfb_team9$last_5$wins
cfb_team9$last_5.losses <- cfb_team9$last_5$losses
cfb_team9$points.against <- cfb_team9$points$against
cfb_team9$points.net <- cfb_team9$points$net

cfb_team10$overall.wins <- cfb_team10$overall$wins
cfb_team10$overall.losses <- cfb_team10$overall$losses
cfb_team10$conference.wins <- cfb_team10$in_conference$wins
cfb_team10$conference.losses <- cfb_team10$in_conference$losses
cfb_team10$home.wins <- cfb_team10$home$wins
cfb_team10$home.losses <- cfb_team10$home$losses
cfb_team10$away.wins <- cfb_team10$away$wins
cfb_team10$away.losses <- cfb_team10$away$losses
cfb_team10$decided_by_7.wins <- cfb_team10$decided_by_7_points$wins
cfb_team10$decided_by_7.losses <- cfb_team10$decided_by_7_points$losses
cfb_team10$last_5.wins <- cfb_team10$last_5$wins
cfb_team10$last_5.losses <- cfb_team10$last_5$losses
cfb_team10$points.against <- cfb_team10$points$against
cfb_team10$points.net <- cfb_team10$points$net

cfb_team11$overall.wins <- cfb_team11$overall$wins
cfb_team11$overall.losses <- cfb_team11$overall$losses
cfb_team11$conference.wins <- cfb_team11$in_conference$wins
cfb_team11$conference.losses <- cfb_team11$in_conference$losses
cfb_team11$home.wins <- cfb_team11$home$wins
cfb_team11$home.losses <- cfb_team11$home$losses
cfb_team11$away.wins <- cfb_team11$away$wins
cfb_team11$away.losses <- cfb_team11$away$losses
cfb_team11$decided_by_7.wins <- cfb_team11$decided_by_7_points$wins
cfb_team11$decided_by_7.losses <- cfb_team11$decided_by_7_points$losses
cfb_team11$last_5.wins <- cfb_team11$last_5$wins
cfb_team11$last_5.losses <- cfb_team11$last_5$losses
cfb_team11$points.against <- cfb_team11$points$against
cfb_team11$points.net <- cfb_team11$points$net

## COMBINE INTO ONE DATA FRAME
cfb_teams2018 <- rbind(cfb_team1, cfb_team2, cfb_team3, cfb_team4, cfb_team5, cfb_team6, cfb_team7, cfb_team8, cfb_team9, cfb_team10, cfb_team11)

Now you should have a data frame, named ‘cfb_teams2018’ with team information for the 2018 season. I believe this is updated each week, as games are played, so depending on when you make the call you should have close to the latest information.

Collecting Game Data

## API CALL FOR TEAM DATA AND INDIVIDUAL GAME DATA
srgames.raw.result <- GET(url = srurl, path = srpath)
srgames.raw.content <- rawToChar(srgames.raw.result$content)
srgames.content <- fromJSON(srgames.raw.content)

## PULL GAME DATA BY WEEK OUT OF LISTS
cfb_week1 <- srgames.content$weeks$games[[1]]
cfb_week2 <- srgames.content$weeks$games[[2]]
cfb_week3 <- srgames.content$weeks$games[[3]]
cfb_week4 <- srgames.content$weeks$games[[4]]
cfb_week5 <- srgames.content$weeks$games[[5]]
cfb_week6 <- srgames.content$weeks$games[[6]]
cfb_week7 <- srgames.content$weeks$games[[7]]
cfb_week8 <- srgames.content$weeks$games[[8]]
cfb_week9 <- srgames.content$weeks$games[[9]]
cfb_week10 <- srgames.content$weeks$games[[10]]
cfb_week11 <- srgames.content$weeks$games[[11]]
cfb_week12 <- srgames.content$weeks$games[[12]]
cfb_week13 <- srgames.content$weeks$games[[13]]

## PULL DATA FRAMES OUT OF DATA FRAMES
cfb_week1$week <- 1
cfb_week2$week <- 2
cfb_week3$week <- 3
cfb_week4$week <- 4
cfb_week5$week <- 5
cfb_week6$week <- 6
cfb_week7$week <- 7
cfb_week8$week <- 8
cfb_week9$week <- 9
cfb_week10$week <- 10
cfb_week11$week <- 11
cfb_week12$week <- 12
cfb_week13$week <- 13

## COMBINE GAMES FROM ALL WEEKS INTO ONE DATA FRAME
cfb_games2018 <- rbind(cfb_week1, cfb_week2, cfb_week3, cfb_week4, cfb_week5, cfb_week6, cfb_week7, cfb_week8, cfb_week9, cfb_week10, cfb_week11, cfb_week12, cfb_week13)

There you have it. Game by game data for the 2018 college football season through week 13. Happy analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *