November 12th, 2024
By Filip Wojda · 7 min read
R is specifically designed to offer a language and environment designed for statistical computing. Crucially, it’s also focused on graphics. Data visualization lies at the core of this language’s offering, with users able to graphically demonstrate their data in everything from a simple data frame to more complex bar charts and box plots.
We explore the first steps for data visualization with R here using a free-to-download library of tools and ggplot.
You’ll work with ggplot2 for the following steps. Why ggplot2? It’s the most versatile of the several systems available in R to represent data visually. Let’s move into the steps.
To get ggplot2, you need the tidyverse library installed. That library is home not only to ggplot2 but an array of functions and datasets that you’ll need for these steps. Assuming you don’t already have tidyverse, open a new R script and enter the following code:
install.packages("tidyverse")
library(tidyverse)
You now have tidyverse installed. With that, you’ll only need the library(tidyverse) line every time you create a program through which you want to use the assets contained in the tidyverse library.
As for the dataset you’ll use in the following steps, go for the “mpg” data frame. You can boot this up by typing the following code:
ggplot2::mpg
A tabular collection of variables should appear showing 228 rows of cars, each with details including their manufacturer, engine size, and fuel efficiency on the highway. Type ?mpg into R if you need any help with the table’s rows and columns. You now have a dataset loaded up. Let’s dig into some data visualizations.
We’re using ggplot because it’s simple and it’s based on the “Grammar of Graphics” principles. These principles offer modern data visualization thanks to a structured approach that allows you to systematically plot out the points in your visualization. ggplot is versatile. You can use it to create a coordinate system out of your data on which you can layer other visualizations. We’re going to keep things simple and use ggplot to map out a coordinate bar chart using the “displ” and “hwy” fields from the “mpg” data frame.
· displ – Vehicle engine size
· hwy – Fuel efficiency on the highway
What we should see when we run the following piece of code is a set of coordinates proving that vehicles with larger engines use more fuel on the highway. A logical conclusion. To map out the data and prove the hypothesis, type these lines:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
What have you done here?
You’ve used the “ggplot(data = mpg)” function to create your basic coordinate system “mpg” dataset. The second line of code adds a layer of points to your coordinate system, leaving you with the first of many scatter plots you can create with the “mpg” data frame.
You should see a scatter plot showing that the cars with the smallest engines achieve the best fuel efficiency. So, you’ve proven the hypothesis.
Take a look at the “geom_point” function again. You’ll see it’s followed by the “mapping” argument, which is what you use to determine the x and y axes of your scatter plot. Specifically, you’ll see the “displ” variable matched to x and the “hwy” variable on y.
What if you want to play around with different variables from the “mpg” data frame?
You need a template, which you create using the following code:
ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
Each of the angle bracket sections indicates a point where you enter a variable. <DATA> is for your dataset (“mpg” in our example). <GEOM_FUNCTION_ covers the type of function you’ll use to visually display your data, with <MAPPINGS> covering the axes, column choices, and, once you get deeper into data visualization in R, things like the colors you’ll use to represent data.
Experiment with this template. Try entering different field names from the “mpg” dataset to create new scatter plots. Remember, the “?mpg” command is your friend if you can’t figure out what a field name represents – it sends you to the dataset’s help file and explains everything.
We’ve only touched on the basics of data visualization in R with the above steps. You can go a lot further once you get to grips with the language. For instance, different geoms allow you to plot out your data with other visualizations. Take “geom_smooth” as an example. Rather than plotting out coordinate points, this geom represents your data as a smooth curve that connects the data points you’ve plotted.
Again, play around.
Try different geoms and datasets to see what results you get. Now, let’s take a quick look at why you’d even want to use R for data visualization, as well as a handful of reasons why you might not.
- Massive Open Source Ecosystem: R has libraries and packages for days, all provided by users worldwide and covering a bunch of statistical and data visualization needs.
- Specific Design: R was made for statistical computing. The entire language is geared around helping you with almost every data analysis task you can imagine.
- Integration: CSV, SQL, Excel, and JSON are all supported in R. You can even port R over into Python using the “rpy2” package.
- There’s a Learning Curve: R doesn’t make it easy for people who have no coding background. You’re dealing with data frames, vectors, and a whole lot of coding to visualize your data.
- Memory Problems: R is a resource hog. The bigger your datasets, the more memory it needs to sort through them.
We love R. It’s versatile and offers everything you need to visualize data in a bunch of different ways. But it’s also complex – if you’re not familiar with coding you’re not going to get the most out of this powerful data visualization tool.
Why not make things simpler with an AI-infused tool that lets you chat with your datasets in plain English? Julius AI allows you to do just that. Get expert-level insights in seconds and visualize data however you want with a platform designed to make it easy for you to find out what’s hiding behind your numbers. Try Julius AI today – solve problems and generate reports in minutes.