In order to kick off a personal college football rating project with R, I knew I needed team data and game by game data for the 2018 college football season for all 130 teams. I was able to obtain this data through the Sportradar API.
They were gracious enough to provide me with access to the API for 30 days, although access usually requires a fee, especially if you are monetizing your project. I won’t go through all of the steps of obtaining access to their API here. But once you have proper access, this will show you how to call and transform the API data into a workable data frame for analysis.
Here are my API calls using the httr and jsonlite packages:
## ASSUMING THESE ARE ALREADY INSTALLED library(httr) library(jsonlite) options(stringsAsFactors = FALSE) ## STORE YOUR SPORT RADAR API INFORMATION sruser <- "YOURUSERNAME" srid <- "YOURUSERID" srsecret <- "YOURUSERSECRET" srtoken <- "YOURTOKEN" srappname <- "spacialsand" srurl <- "https://api.sportradar.us" srpath <- "/ncaafb-t1/2018/REG/schedule.json?api_key=APIKEYHERE" srteams <- "/ncaafb-t1/teams/FBS/2018/REG/standings.json?api_key=APIKEYHERE"
Collecting Team Data
Once you have your API access information stored (above) you can start making API calls from R with GET, like this:
## API CALL FOR TEAM DATA srteams.raw.result <- GET(url = srurl, path = srteams) srteams.raw.content <- rawToChar(srteams.raw.result$content) srteams.content <- fromJSON(srteams.raw.content) ## PULL TEAM DATA BY CONFERENCE OUT OF LISTS cfb_team1 <- srteams.content$division$conferences$teams[[1]] cfb_team2 <- srteams.content$division$conferences$teams[[2]] cfb_team3 <- srteams.content$division$conferences$teams[[3]] cfb_team4 <- srteams.content$division$conferences$teams[[4]] cfb_team5 <- srteams.content$division$conferences$teams[[5]] cfb_team6 <- srteams.content$division$conferences$teams[[6]] cfb_team7 <- srteams.content$division$conferences$teams[[7]] cfb_team8 <- srteams.content$division$conferences$teams[[8]] cfb_team9 <- srteams.content$division$conferences$teams[[9]] cfb_team10 <- srteams.content$division$conferences$teams[[10]] cfb_team11 <- srteams.content$division$conferences$teams[[11]] ## SOME TEAMS DO NOT HAVE SUBDIVISIONS BUT WE NEED EQUAL COLUMNS cfb_team3$subdivision <- NA cfb_team6$subdivision <- NA
Quick note on what is occurring in the above code chunks…when you first retrieve data from the Sportradar API, it will return raw data that is not easy to work with. So we are basically taking the raw data and keeping only the information we need, then transforming that from JSON format to more workable tables in R.
Important note: In the second-to-last step, I create data frames for each conference because we get to a point where we end up with lists and need a way to pluck out the separated data and eventually combine it into one data frame. I am positive there is a more efficient way to tackle this, perhaps looping through the lists.
This is how I was able to make it work, but suggest you consider alternative ways in order to keep your R code efficient. And it’s great practice!
At this point, we end up with a number of data frames within data frames, which is problematic during analysis. To deal with it, I took a very (embarrassingly) manual approach to this, which again should be done in a more efficient way. If you have better suggestions, please let me know in the comments. But until I revisit it at another time, here is a long way to handle it, pulling out the variables that I care to keep:
cfb_team1$overall.wins <- cfb_team1$overall$wins cfb_team1$overall.losses <- cfb_team1$overall$losses cfb_team1$conference.wins <- cfb_team1$in_conference$wins cfb_team1$conference.losses <- cfb_team1$in_conference$losses cfb_team1$home.wins <- cfb_team1$home$wins cfb_team1$home.losses <- cfb_team1$home$losses cfb_team1$away.wins <- cfb_team1$away$wins cfb_team1$away.losses <- cfb_team1$away$losses cfb_team1$decided_by_7.wins <- cfb_team1$decided_by_7_points$wins cfb_team1$decided_by_7.losses <- cfb_team1$decided_by_7_points$losses cfb_team1$last_5.wins <- cfb_team1$last_5$wins cfb_team1$last_5.losses <- cfb_team1$last_5$losses cfb_team1$points.against <- cfb_team1$points$against cfb_team1$points.net <- cfb_team1$points$net cfb_team2$overall.wins <- cfb_team2$overall$wins cfb_team2$overall.losses <- cfb_team2$overall$losses cfb_team2$conference.wins <- cfb_team2$in_conference$wins cfb_team2$conference.losses <- cfb_team2$in_conference$losses cfb_team2$home.wins <- cfb_team2$home$wins cfb_team2$home.losses <- cfb_team2$home$losses cfb_team2$away.wins <- cfb_team2$away$wins cfb_team2$away.losses <- cfb_team2$away$losses cfb_team2$decided_by_7.wins <- cfb_team2$decided_by_7_points$wins cfb_team2$decided_by_7.losses <- cfb_team2$decided_by_7_points$losses cfb_team2$last_5.wins <- cfb_team2$last_5$wins cfb_team2$last_5.losses <- cfb_team2$last_5$losses cfb_team2$points.against <- cfb_team2$points$against cfb_team2$points.net <- cfb_team2$points$net cfb_team3$overall.wins <- cfb_team3$overall$wins cfb_team3$overall.losses <- cfb_team3$overall$losses cfb_team3$conference.wins <- cfb_team3$in_conference$wins cfb_team3$conference.losses <- cfb_team3$in_conference$losses cfb_team3$home.wins <- cfb_team3$home$wins cfb_team3$home.losses <- cfb_team3$home$losses cfb_team3$away.wins <- cfb_team3$away$wins cfb_team3$away.losses <- cfb_team3$away$losses cfb_team3$decided_by_7.wins <- cfb_team3$decided_by_7_points$wins cfb_team3$decided_by_7.losses <- cfb_team3$decided_by_7_points$losses cfb_team3$last_5.wins <- cfb_team3$last_5$wins cfb_team3$last_5.losses <- cfb_team3$last_5$losses cfb_team3$points.against <- cfb_team3$points$against cfb_team3$points.net <- cfb_team3$points$net cfb_team4$overall.wins <- cfb_team4$overall$wins cfb_team4$overall.losses <- cfb_team4$overall$losses cfb_team4$conference.wins <- cfb_team4$in_conference$wins cfb_team4$conference.losses <- cfb_team4$in_conference$losses cfb_team4$home.wins <- cfb_team4$home$wins cfb_team4$home.losses <- cfb_team4$home$losses cfb_team4$away.wins <- cfb_team4$away$wins cfb_team4$away.losses <- cfb_team4$away$losses cfb_team4$decided_by_7.wins <- cfb_team4$decided_by_7_points$wins cfb_team4$decided_by_7.losses <- cfb_team4$decided_by_7_points$losses cfb_team4$last_5.wins <- cfb_team4$last_5$wins cfb_team4$last_5.losses <- cfb_team4$last_5$losses cfb_team4$points.against <- cfb_team4$points$against cfb_team4$points.net <- cfb_team4$points$net cfb_team5$overall.wins <- cfb_team5$overall$wins cfb_team5$overall.losses <- cfb_team5$overall$losses cfb_team5$conference.wins <- cfb_team5$in_conference$wins cfb_team5$conference.losses <- cfb_team5$in_conference$losses cfb_team5$home.wins <- cfb_team5$home$wins cfb_team5$home.losses <- cfb_team5$home$losses cfb_team5$away.wins <- cfb_team5$away$wins cfb_team5$away.losses <- cfb_team5$away$losses cfb_team5$decided_by_7.wins <- cfb_team5$decided_by_7_points$wins cfb_team5$decided_by_7.losses <- cfb_team5$decided_by_7_points$losses cfb_team5$last_5.wins <- cfb_team5$last_5$wins cfb_team5$last_5.losses <- cfb_team5$last_5$losses cfb_team5$points.against <- cfb_team5$points$against cfb_team5$points.net <- cfb_team5$points$net cfb_team6$overall.wins <- cfb_team6$overall$wins cfb_team6$overall.losses <- cfb_team6$overall$losses cfb_team6$conference.wins <- cfb_team6$in_conference$wins cfb_team6$conference.losses <- cfb_team6$in_conference$losses cfb_team6$home.wins <- cfb_team6$home$wins cfb_team6$home.losses <- cfb_team6$home$losses cfb_team6$away.wins <- cfb_team6$away$wins cfb_team6$away.losses <- cfb_team6$away$losses cfb_team6$decided_by_7.wins <- cfb_team6$decided_by_7_points$wins cfb_team6$decided_by_7.losses <- cfb_team6$decided_by_7_points$losses cfb_team6$last_5.wins <- cfb_team6$last_5$wins cfb_team6$last_5.losses <- cfb_team6$last_5$losses cfb_team6$points.against <- cfb_team6$points$against cfb_team6$points.net <- cfb_team6$points$net cfb_team7$overall.wins <- cfb_team7$overall$wins cfb_team7$overall.losses <- cfb_team7$overall$losses cfb_team7$conference.wins <- cfb_team7$in_conference$wins cfb_team7$conference.losses <- cfb_team7$in_conference$losses cfb_team7$home.wins <- cfb_team7$home$wins cfb_team7$home.losses <- cfb_team7$home$losses cfb_team7$away.wins <- cfb_team7$away$wins cfb_team7$away.losses <- cfb_team7$away$losses cfb_team7$decided_by_7.wins <- cfb_team7$decided_by_7_points$wins cfb_team7$decided_by_7.losses <- cfb_team7$decided_by_7_points$losses cfb_team7$last_5.wins <- cfb_team7$last_5$wins cfb_team7$last_5.losses <- cfb_team7$last_5$losses cfb_team7$points.against <- cfb_team7$points$against cfb_team7$points.net <- cfb_team7$points$net cfb_team8$overall.wins <- cfb_team8$overall$wins cfb_team8$overall.losses <- cfb_team8$overall$losses cfb_team8$conference.wins <- cfb_team8$in_conference$wins cfb_team8$conference.losses <- cfb_team8$in_conference$losses cfb_team8$home.wins <- cfb_team8$home$wins cfb_team8$home.losses <- cfb_team8$home$losses cfb_team8$away.wins <- cfb_team8$away$wins cfb_team8$away.losses <- cfb_team8$away$losses cfb_team8$decided_by_7.wins <- cfb_team8$decided_by_7_points$wins cfb_team8$decided_by_7.losses <- cfb_team8$decided_by_7_points$losses cfb_team8$last_5.wins <- cfb_team8$last_5$wins cfb_team8$last_5.losses <- cfb_team8$last_5$losses cfb_team8$points.against <- cfb_team8$points$against cfb_team8$points.net <- cfb_team8$points$net cfb_team9$overall.wins <- cfb_team9$overall$wins cfb_team9$overall.losses <- cfb_team9$overall$losses cfb_team9$conference.wins <- cfb_team9$in_conference$wins cfb_team9$conference.losses <- cfb_team9$in_conference$losses cfb_team9$home.wins <- cfb_team9$home$wins cfb_team9$home.losses <- cfb_team9$home$losses cfb_team9$away.wins <- cfb_team9$away$wins cfb_team9$away.losses <- cfb_team9$away$losses cfb_team9$decided_by_7.wins <- cfb_team9$decided_by_7_points$wins cfb_team9$decided_by_7.losses <- cfb_team9$decided_by_7_points$losses cfb_team9$last_5.wins <- cfb_team9$last_5$wins cfb_team9$last_5.losses <- cfb_team9$last_5$losses cfb_team9$points.against <- cfb_team9$points$against cfb_team9$points.net <- cfb_team9$points$net cfb_team10$overall.wins <- cfb_team10$overall$wins cfb_team10$overall.losses <- cfb_team10$overall$losses cfb_team10$conference.wins <- cfb_team10$in_conference$wins cfb_team10$conference.losses <- cfb_team10$in_conference$losses cfb_team10$home.wins <- cfb_team10$home$wins cfb_team10$home.losses <- cfb_team10$home$losses cfb_team10$away.wins <- cfb_team10$away$wins cfb_team10$away.losses <- cfb_team10$away$losses cfb_team10$decided_by_7.wins <- cfb_team10$decided_by_7_points$wins cfb_team10$decided_by_7.losses <- cfb_team10$decided_by_7_points$losses cfb_team10$last_5.wins <- cfb_team10$last_5$wins cfb_team10$last_5.losses <- cfb_team10$last_5$losses cfb_team10$points.against <- cfb_team10$points$against cfb_team10$points.net <- cfb_team10$points$net cfb_team11$overall.wins <- cfb_team11$overall$wins cfb_team11$overall.losses <- cfb_team11$overall$losses cfb_team11$conference.wins <- cfb_team11$in_conference$wins cfb_team11$conference.losses <- cfb_team11$in_conference$losses cfb_team11$home.wins <- cfb_team11$home$wins cfb_team11$home.losses <- cfb_team11$home$losses cfb_team11$away.wins <- cfb_team11$away$wins cfb_team11$away.losses <- cfb_team11$away$losses cfb_team11$decided_by_7.wins <- cfb_team11$decided_by_7_points$wins cfb_team11$decided_by_7.losses <- cfb_team11$decided_by_7_points$losses cfb_team11$last_5.wins <- cfb_team11$last_5$wins cfb_team11$last_5.losses <- cfb_team11$last_5$losses cfb_team11$points.against <- cfb_team11$points$against cfb_team11$points.net <- cfb_team11$points$net ## COMBINE INTO ONE DATA FRAME cfb_teams2018 <- rbind(cfb_team1, cfb_team2, cfb_team3, cfb_team4, cfb_team5, cfb_team6, cfb_team7, cfb_team8, cfb_team9, cfb_team10, cfb_team11)
Now you should have a data frame, named ‘cfb_teams2018’ with team information for the 2018 season. I believe this is updated each week, as games are played, so depending on when you make the call you should have close to the latest information.
Collecting Game Data
## API CALL FOR TEAM DATA AND INDIVIDUAL GAME DATA srgames.raw.result <- GET(url = srurl, path = srpath) srgames.raw.content <- rawToChar(srgames.raw.result$content) srgames.content <- fromJSON(srgames.raw.content) ## PULL GAME DATA BY WEEK OUT OF LISTS cfb_week1 <- srgames.content$weeks$games[[1]] cfb_week2 <- srgames.content$weeks$games[[2]] cfb_week3 <- srgames.content$weeks$games[[3]] cfb_week4 <- srgames.content$weeks$games[[4]] cfb_week5 <- srgames.content$weeks$games[[5]] cfb_week6 <- srgames.content$weeks$games[[6]] cfb_week7 <- srgames.content$weeks$games[[7]] cfb_week8 <- srgames.content$weeks$games[[8]] cfb_week9 <- srgames.content$weeks$games[[9]] cfb_week10 <- srgames.content$weeks$games[[10]] cfb_week11 <- srgames.content$weeks$games[[11]] cfb_week12 <- srgames.content$weeks$games[[12]] cfb_week13 <- srgames.content$weeks$games[[13]] ## PULL DATA FRAMES OUT OF DATA FRAMES cfb_week1$week <- 1 cfb_week2$week <- 2 cfb_week3$week <- 3 cfb_week4$week <- 4 cfb_week5$week <- 5 cfb_week6$week <- 6 cfb_week7$week <- 7 cfb_week8$week <- 8 cfb_week9$week <- 9 cfb_week10$week <- 10 cfb_week11$week <- 11 cfb_week12$week <- 12 cfb_week13$week <- 13 ## COMBINE GAMES FROM ALL WEEKS INTO ONE DATA FRAME cfb_games2018 <- rbind(cfb_week1, cfb_week2, cfb_week3, cfb_week4, cfb_week5, cfb_week6, cfb_week7, cfb_week8, cfb_week9, cfb_week10, cfb_week11, cfb_week12, cfb_week13)
There you have it. Game by game data for the 2018 college football season through week 13. Happy analysis.