Introduction to R
In this course, you will explore the versatility of R, a powerful language for statistical computing and graphics. You will discover the benefits of using R and get started with the basics, and fain confidence with the user-friendly R Studio interface and learn fundamental R concepts. You will also dive into the Tidyverse, a collection of packages for data storage, visualisation and manipulation. This course offers a solid foundation to kickstart your journey with R!
This book is designed to accompany the Introduction to R training that is starting at KNBS. To complete this course, you will need to have R Studio installed on your computer.
If you’re running through this book solo, it is recommended to run through it in order and try out all the of the exercises as you go through. Each exercise has a Solution dropdown, which allows you to view prompts to help with the question and see the answers.
Session aims
- navigate the R and R Studio environment
- understand and use the common R functions for data manipulation
- understand the basics of data visualisation using the ggplot2 package
- understand the term tidy data and why it is important for writing efficient code
What is R?
R is an open-source programming language and software environment, designed primarily for statistical computing. It has a long history - it is based on the S language, which was developed in 1976 in Bell Labs, where the UNIX operating system and the C and C++ languages were developed. The R language itself was developed in the 1990s, with the first stable version release in 2000.
R has grown rapidly in popularity particularly in the last five years, due to the increased interest in the data science field. It is now a key tool used by analysts in governments globally.
Some of the advantages:
- It is popular - there is a large, active and rapidly growing community of R programmers, which has resulted in a plethora of resources and extensions.
- It is powerful - the history as a statistical language means it is well suited for data analysis and manipulation.
- It is extensible - there are a vast array of packages that can be added to extend the functionality of R, produced by statisticians and programmers around the world. These can range from obscure statistical techniques to tools for making interactive charts.
- It’s free and open source - a large part of its popularity can be owed to its low cost, particularly relative to proprietary software such as SAS or STATA.
Introducing RStudio
RStudio is an integrated development environment (IDE) for R. You don’t have to use an IDE but it’s strongly advised as it provides a user-friendly interface to work with. RStudio has four main panels;
- Script Editor (top left) - used to write and save your code, which is only run when you explicitly tell RStudio to do so.
- Console (bottom left) - all code is run through the console, even the code you write in the script editor is sent to the console to be run. It’s perfect for quickly viewing data structures and help for functions but should not be used to write code you want to save (that should be done in the script editor).
- Environment (top right) - all data, objects and functions that you have read in/created will appear here.
- Files/Plots/Help (bottom right) - this pane groups a few miscellaneous areas of RStudio.
- Files acts like the windows folder to navigate between files and folders.
- Plots shows any graphs that you generate.
- Packages let’s you install and manage packages currently in use.
- Help provides information about packages or functions, including how to use them.
- Viewer is essentially RStudio’s built-in browser, which can be used for web app development.
You may have noticed that your Script Editor is bigger than the Console or your Environment has suddenly disappeared. In RStudio, you can adjust the size of different panes by clicking and dragging the dividers between them. If you want to maximize a specific pane, such as the Script Editor, use the shortcut Ctrl + Shift + 1 (Windows/Linux) or Cmd + Shift + 1 (Mac) to focus on it. To restore the default layout, press Ctrl + Shift + 0 (Windows/Linux) or Cmd + Shift + 0 (Mac). You can also use the View menu to toggle different panes on and off, ensuring your workspace suits your needs.
If you find the text difficult to read or prefer a different appearance, you can customise the theme, font, and text size in RStudio. Go to Tools
> Global Options
> Appearance
, where you can choose from different editor themes (e.g., light or dark mode), adjust the font type, and increase or decrease the text size for better readability. These changes can help make coding more comfortable, especially during long sessions.
Recommended Changes
While not necessary, certain changes are almost always recommended for visibility reasons. These include:
- Choosing a different theme, as Textmate can be hard on the eyes. This can be done in
Tools
>Global Options
>Appearance
>Editor theme:
. - Highlight R function calls. This makes functions look a different colour than normal text, which can make reading your code much easier. This can be done in
Tools
>Global Options
>Code
>Display
>Highlight R function calls
. - Use Rainbow Parenthesis. This makes each pair of () in a line a different colour, which can help you catch if you’re missing one and it’s breaking your code. This can be done in
Tools
>Global Options
>Code
>Display
>Use rainbow parenthesis
.