Plotly for College Analysis


Community colleges for many is a stepping stone to a larger dream of four year university. However, many of the risk factors that prevent students from succeeding in school are highly prevelant in the student body of 2-year institutions.

In this post, we explore open college data. This dataset includes the name of the college, the aggregated number of people who passed a particular subject by transfer level and race. The data itself contains information on race, and college name. It also contains information on the pass rate/number of students who passed by subject.

Let’s read our data and take a look:

There is a wonderful package called DT that creates interactive plots in HTML. It even comes with a search feature. Check it out.

#Load Packages:
require(DT)
## Loading required package: DT
require(plotly)
## Loading required package: plotly
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
school <- read.table(url, header = T)

ggplot

The ggplot package is essentially the bread and butter of beautiful and easy to use plots. We will start with data = school (duh), and x = ethnicity, y = sum. The “sum”" here is just the total number of students who passed given course.

g <- ggplot(data = school,
            aes(x = ethnicity, y = sum, fill = group)) + geom_boxplot() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) #Vertical Axis
g

Interactivity

In order to make this plot interactive, we have to do something really challenging. Are you ready? Here we go…

Type the following line of code.

ggplotly(g)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
African AmericanAmerican Indian/Alaskan NativeAsianHispanicMulti-EthnicityPacific IslanderUnknownWhite Non-Hispanic0200400600
Four Levels Below TransferOne Level Below TransferThree Levels Below TransferTwo Levels Below Transferethnicitysumgroup

That’s it! We can now do all sorts of amazing plots with minimal code! Let’s look as some more stuff.

g <- ggplot(data = school,
            aes(x = ethnicity, y = success, fill = ethnicity)) +
  geom_bar(stat = "identity")  +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) #Vertical Axis
ggplotly(g)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
African AmericanAmerican Indian/Alaskan NativeAsianHispanicMulti-EthnicityPacific IslanderUnknownWhite Non-Hispanic050001000015000
African AmericanAmerican Indian/Alaskan NativeAsianHispanicMulti-EthnicityPacific IslanderUnknownWhite Non-Hispanicethnicitysuccessethnicity

Make sure you clean your data!

#Refactor
school$group <- factor(school$group,
                       levels = c("One Level Below Transfer",
                                  "Two Levels Below Transfer",
                                  "Three Levels Below Transfer",
                                  "Four Levels Below Transfer"))

g <- ggplot(data = school,
            aes(x = students, y = sum, color = ethnicity)) +
  geom_point() + facet_wrap( ~ group, ncol=2)
ggplotly(g)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
020040060004008001200020040060004008001200
African AmericanAmerican Indian/Alaskan NativeAsianHispanicMulti-EthnicityPacific IslanderUnknownWhite Non-HispanicstudentssumOne Level Below TransferTwo Levels Below TransferThree Levels Below TransferFour Levels Below Transferethnicity
g <- ggplot(data = school,
            aes(x = subject, y = sum, color = ethnicity)) +
  geom_boxplot() + facet_wrap( ~ ethnicity, ncol=2)
ggplotly(g)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
020040060002004006000200400600EnglishMath0200400600EnglishMath
African AmericanAmerican Indian/Alaskan NativeAsianHispanicMulti-EthnicityPacific IslanderUnknownWhite Non-HispanicsubjectsumAfrican AmericanAmerican Indian/Alaskan NativeAsianHispanicMulti-EthnicityPacific IslanderUnknownWhite Non-Hispanicethnicity
g <- ggplot(data = school,
            aes(x = group, y = sum, fill = subject)) +
  geom_boxplot() + facet_wrap( ~ subject, ncol=2)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) #Vertical Axis
ggplotly(g)
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
One Level Below TransferTwo Levels Below TransferThree Levels Below TransferFour Levels Below Transfer0200400600One Level Below TransferTwo Levels Below TransferThree Levels Below TransferFour Levels Below Transfer
EnglishMathgroupsumEnglishMathsubject

Back to blog

In God we trust. All others must bring data.

- W. Edwards Deming, Statistician