Exploratory Data Analysis

Logo

Data Science Institute
Vanderbilt University


Course Overview
Course Materials
Course Policies

View the Project on GitHub dsi-explore/eda-course-website

Data Quickfire! :fire:

Welcome to your second data quickfire competition in EDA. This week we will focus on applying all of our EDA skills, including cluster analysis, to data on states in the global system.

READ ALL OF THE INSTRUCTIONS!

Steps

  1. First, create a repo within dsi-explore on github. Name the repo “lastname-dataquickfire” or whatever name you have been using for your homework submissions.
  2. Create an R script or .Rmd file and download the data, which is posted under ‘data’ in today’s class materials. All of your work must be done in R.
  3. The data is ‘polity’ data which has similar variables to what we worked with in class last time, but the data is a bit different so you cannot simply apply the exact code from last time. User’s manual
  4. Next, explore the data + apply k-means cluster analysis. Your analysis should be guided by the following set of questions: does cluster analysis help us discover democracies in the data? Autocracies? Or perhaps regional trends based on geography?
  5. Submit your data at the end of class by committing to github via the repo you created above. (Everyone must submit at the end of class to get credit, regardless of whether you choose to do step 6 below.)
  6. Optional: if you want a little time to keep playing with this, you may submit a second time by 11:59pm tomorrow (Friday Nov. 1 at 11:59PM)
  7. Winners will be announced next week.

Tips

  1. Don’t forget, cluster analysis requires numeric data!
  2. Don’t forget, you’ll need to decide whether you need to scale/standardize the data!
  3. As you’ve seen in TWO in-class examples, labels can be tricky with cluster analysis, so consider how to deal with this from the beginning.