Licensing

This walkthrough is distributed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Getting data into R

Ways to get data into R

In order to use your data in R, you must import it and turn it into an R object. There are many ways to get data into R.

  • Manually: You can manually create it as we did at the end of last session. To create a data.frame, use the data.frame() and specify your variables.
  • Import it from a file Below is a very incomplete list
  • Text: TXT (readLines() function)
  • Tabular data: CSV, TSV (read.table() function or readr package)
  • Excel: XLSX (xlsx package)
  • Google sheets: (googlesheets package)
  • Statistics program: SPSS, SAS (haven package)
  • Databases: MySQL (RMySQL package)
  • Gather it from the web: You can connect to webpages, servers, or APIs directly from within R, or you can create a data scraped from HTML webpages using the rvest package.
  • For example, connect to the Twitter API with the twitteR package, or Altmetrics data with rAltmetric, or World Bank’s World Development Indicators with WDI.

readr

R has some base functions for reading a local data file into your R session–namely read.table() and read.csv(), but these have some idiosyncrasies that were improved upon in the readr package, which is installed and loaded with tidyverse. You can either load tidyverse, which will automatically load readr, or you can load readr individually.

library(tidyverse)

# or

library(readr)

For this session, we will be reading a CSV from a web connection rather than saving the data to our computer and loading it into R. However, to do that, see the below section on Loading data from a local file.

To get our sample data into our R session, we will use the read_csv() function and connect to a CSV saved on my GitHub using the url() function.

books_url <- url("https://raw.githubusercontent.com/ciakovx/ciakovx.github.io/master/data/books.csv")
books <- readr::read_csv(books_url)
Rows: 5991 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (11): CALL...BIBLIO., X245.ab, X245.c, LOCATION, LOUTDATE, SUBJECT, ISN,...
dbl  (1): TOT.CHKOUT

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
books