El presente tutorial está basado en la publicación de Michael Levy y la publicación de Bradley Boehmke. El material ha sido readaptado para cumplir el objetivo del curso.

Mayor detalle en el libro de Hadley Wickham.

Objetivo

  • Introducir las herramientas y estilo del tidyverse para la limpieza de datos.

Análisis de datos

Analysts tend to follow 4 fundamental processes to turn data into understanding, knowledge & insight:

  1. Data manipulation
  2. Data visualization
  3. Statistical analysis/modeling
  4. Deployment of results

This tutorial will focus on data manipulation

Data manipulation

It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data. (Dasu and Johnson, 2003)

Well structured data serves two purposes:

  • Makes data suitable for software processing whether that be mathematical functions, visualization, etc.
  • Reveals information and insights
data wrangle

data wrangle

Tidy data

Put data in data frames

  • Each variable gets a column
  • Each observation gets a row
  • Each type of observation gets a data frame
tidy data

tidy data

What is the tidyverse?

The tidyverse is a suite of R tools that follow a tidy philosophy.

Tidy APIs

Functions should be consistent and easily (human) readable

  • Take one step at a time
  • Connect simple steps with the pipe
  • Referential transparency

Okay but really, what is it?

Suite of ~20 packages that provide consistent, user-friendly, smart-default tools to do most of what most people do in R.

  • Core packages: ggplot2, dplyr, tidyr, readr, purrr, tibble
  • Specialized data manipulation: hms, stringr, lubridate, forcats
  • Data import: DBI, haven, httr, jsonlite, readxl, rvest, xml2
  • Modeling: modelr, broom

install.packages(tidyverse) installs all of the above packages.

library(tidyverse) attaches only the core packages.

tidyverse

tidyverse

tidyverse functions

tidyverse functions

Why tidyverse?

  • Consistency
    • e.g. Many functions take data.frame first -> piping
      • Faster to write
      • Easier to read
      • Easier to remember
    • Tidy data: Imposes good practices
  • Simple solutions to common problems (e.g. tidyr::separate)
  • Runs fast (thanks to Rcpp).
  • It is modular! (with the UNIX pipe | “spirit”)

tibble

A modern reimagining of data frames.

tdf = tibble(x = 1:1e4, y = rnorm(1e4))  # == data_frame(x = 1:1e4, y = rnorm(1e4))
class(tdf)
## [1] "tbl_df"     "tbl"        "data.frame"

Tibbles print politely.

tdf
## # A tibble: 10,000 x 2
##        x          y
##    <int>      <dbl>
##  1     1  1.5769043
##  2     2 -0.7643222
##  3     3 -1.5141980
##  4     4 -2.2969305
##  5     5  1.1883936
##  6     6 -0.8345075
##  7     7  0.2408071
##  8     8 -0.4245211
##  9     9 -1.0505558
## 10    10  0.6696043
## # ... with 9,990 more rows

Tibbles have some convenient and consistent defaults that are different from base R data.frames.

The pipe %>%

Sends the output of the LHS function to the first argument of the RHS function.

sum(1:8) %>%
  sqrt()
## [1] 6

%>% se obtiene de forma automática con el atajo Ctrl+M

  • When you desire to perform multiple functions its advantage becomes obvious.

  • For instance, if we want to
    • filter some data,
    • summarize it, and then
    • order the summarized results we would write it out as:

Nested Option:

arrange(
        summarize(
            filter(data, variable == numeric_value),
            Total = sum(variable)
        ),
    desc(Total)
)

or

Multiple Object Option:

 a <- filter(data, variable == numeric_value)
 b <- summarise(a, Total = sum(variable))
 c <- arrange(b, desc(Total))

or

%>% Option:

 data %>%
        filter(variable == “value”) %>%
        summarise(Total = sum(variable)) %>%
        arrange(desc(Total))
  • As your function tasks get longer the %>% operator becomes more efficient and makes your code more legible.
  • In addition, the %>% operator allows you to flow from data manipulation tasks straight into vizualization functions (via ggplot and ggvis) and also into many analytic functions.

tidyr

There are four fundamental functions of data tidying:

  • gather() takes multiple columns, and gathers them into key-value pairs: it makes “wide” data longer.
  • spread() takes two columns (key & value) and spreads in to multiple columns, it makes “long” data wider.

  • separate() splits a single column into multiple columns
  • unite() combines multiple columns into a single column

gather and spread

gather to make wide table long, spread to make long tables wide.

tidyr::gather

tidyr::gather

gather

gather

spread

spread

  • mini
library(EDAWR)
cases %>%
  tbl_df() %>%
  gather(key= year, value=n, -country) %>%
  spread(year, n)
## # A tibble: 3 x 4
##   country `2011` `2012` `2013`
## *   <chr>  <dbl>  <dbl>  <dbl>
## 1      DE   5800   6000   6200
## 2      FR   7000   6900   7000
## 3      US  15000  14000  13000
stocks <- data.frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)
stocksm <- stocks %>% gather(stock, price, -time) #%>% count(stock) #use gather()+count()
stocksm %>% spread(stock, price)
stocksm %>% spread(time, price)
  • large
who  # Tuberculosis data from the WHO
## # A tibble: 7,240 x 60
##        country  iso2  iso3  year new_sp_m014 new_sp_m1524 new_sp_m2534 new_sp_m3544
##          <chr> <chr> <chr> <int>       <int>        <int>        <int>        <int>
##  1 Afghanistan    AF   AFG  1980          NA           NA           NA           NA
##  2 Afghanistan    AF   AFG  1981          NA           NA           NA           NA
##  3 Afghanistan    AF   AFG  1982          NA           NA           NA           NA
##  4 Afghanistan    AF   AFG  1983          NA           NA           NA           NA
##  5 Afghanistan    AF   AFG  1984          NA           NA           NA           NA
##  6 Afghanistan    AF   AFG  1985          NA           NA           NA           NA
##  7 Afghanistan    AF   AFG  1986          NA           NA           NA           NA
##  8 Afghanistan    AF   AFG  1987          NA           NA           NA           NA
##  9 Afghanistan    AF   AFG  1988          NA           NA           NA           NA
## 10 Afghanistan    AF   AFG  1989          NA           NA           NA           NA
## # ... with 7,230 more rows, and 52 more variables: new_sp_m4554 <int>,
## #   new_sp_m5564 <int>, new_sp_m65 <int>, new_sp_f014 <int>, new_sp_f1524 <int>,
## #   new_sp_f2534 <int>, new_sp_f3544 <int>, new_sp_f4554 <int>, new_sp_f5564 <int>,
## #   new_sp_f65 <int>, new_sn_m014 <int>, new_sn_m1524 <int>, new_sn_m2534 <int>,
## #   new_sn_m3544 <int>, new_sn_m4554 <int>, new_sn_m5564 <int>, new_sn_m65 <int>,
## #   new_sn_f014 <int>, new_sn_f1524 <int>, new_sn_f2534 <int>, new_sn_f3544 <int>,
## #   new_sn_f4554 <int>, new_sn_f5564 <int>, new_sn_f65 <int>, new_ep_m014 <int>,
## #   new_ep_m1524 <int>, new_ep_m2534 <int>, new_ep_m3544 <int>, new_ep_m4554 <int>,
## #   new_ep_m5564 <int>, new_ep_m65 <int>, new_ep_f014 <int>, new_ep_f1524 <int>,
## #   new_ep_f2534 <int>, new_ep_f3544 <int>, new_ep_f4554 <int>, new_ep_f5564 <int>,
## #   new_ep_f65 <int>, new_rel_m014 <int>, new_rel_m1524 <int>, new_rel_m2534 <int>,
## #   new_rel_m3544 <int>, new_rel_m4554 <int>, new_rel_m5564 <int>, new_rel_m65 <int>,
## #   new_rel_f014 <int>, new_rel_f1524 <int>, new_rel_f2534 <int>, new_rel_f3544 <int>,
## #   new_rel_f4554 <int>, new_rel_f5564 <int>, new_rel_f65 <int>
who %>%
  gather(group, cases, -country, -iso2, -iso3, -year)
## # A tibble: 405,440 x 6
##        country  iso2  iso3  year       group cases
##          <chr> <chr> <chr> <int>       <chr> <int>
##  1 Afghanistan    AF   AFG  1980 new_sp_m014    NA
##  2 Afghanistan    AF   AFG  1981 new_sp_m014    NA
##  3 Afghanistan    AF   AFG  1982 new_sp_m014    NA
##  4 Afghanistan    AF   AFG  1983 new_sp_m014    NA
##  5 Afghanistan    AF   AFG  1984 new_sp_m014    NA
##  6 Afghanistan    AF   AFG  1985 new_sp_m014    NA
##  7 Afghanistan    AF   AFG  1986 new_sp_m014    NA
##  8 Afghanistan    AF   AFG  1987 new_sp_m014    NA
##  9 Afghanistan    AF   AFG  1988 new_sp_m014    NA
## 10 Afghanistan    AF   AFG  1989 new_sp_m014    NA
## # ... with 405,430 more rows

separate and unite

separate unite

  • mini
df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
df %>% 
  tidyr::separate(x, c("A", "B")) %>%
  tidyr::unite(x, A, B, sep=".")
##       x
## 1 NA.NA
## 2   a.b
## 3   a.d
## 4   b.c
mtcars %>%
  tbl_df() %>%
  select(7:9) %>% 
  tidyr::unite(vs_am, vs, am) %>%
  tidyr::separate(vs_am, c("vs", "am"))
  • large
library(EDAWR)
storms %>%
  top_n(2,date) %>%
  separate(date, c("y", "m", "d")) %>%
  unite(date, y,m,d, sep="-")
## # A tibble: 2 x 4
##     storm  wind pressure       date
## *   <chr> <int>    <int>      <chr>
## 1 Alberto   110     1007 2000-08-03
## 2  Arlene    50     1010 1999-06-11
# extra
library(EDAWR)
pollution %>%
  tbl_df() %>%
  spread(size, amount) %>%
  gather(size, amount, -city) %>%
  arrange(desc(city))
## # A tibble: 6 x 3
##       city  size amount
##      <chr> <chr>  <dbl>
## 1 New York large     23
## 2 New York small     14
## 3   London large     22
## 4   London small     16
## 5  Beijing large    121
## 6  Beijing small     56

dplyr

Common data(frame) manipulation tasks.

There are seven fundamental functions of data transformation:

  • select() select variables
  • mutate() create new variables
  • filter() filter observations
  • arrange() reorder observations
  • group_by() groups observations by categorical levels
  • summarise() summarise observations by functions of choice
  • join() joins separate dataframes

select

  • select variables
iris %>%
  tbl_df() %>%
  select(Petal.Length, Petal.Width)
## # A tibble: 150 x 2
##    Petal.Length Petal.Width
##           <dbl>       <dbl>
##  1          1.4         0.2
##  2          1.4         0.2
##  3          1.3         0.2
##  4          1.5         0.2
##  5          1.4         0.2
##  6          1.7         0.4
##  7          1.4         0.3
##  8          1.5         0.2
##  9          1.4         0.2
## 10          1.5         0.1
## # ... with 140 more rows
# equivalent
iris %>%
  tbl_df() %>%
  select(3,4)

iris %>%
  tbl_df() %>%
  select(-Species)

iris %>%
  tbl_df() %>%
  select_if(is.factor)

use select_helpers!!!

# ?select_helpers
iris %>%
  tbl_df() %>%
  select(starts_with("Petal"))
iris %>%
  tbl_df() %>%
  select(ends_with("Width"))
iris %>%
  tbl_df() %>%
  select(contains("etal"))
iris %>%
  tbl_df() %>%
  select(-matches(".t.")) # accepts 'NOT' condition

mutate

  • create new variables
mtcars %>%
  tbl_df() %>%
  select(1:3) %>% 
  mutate(gpm= 1/mpg)
## # A tibble: 32 x 4
##      mpg   cyl  disp        gpm
##    <dbl> <dbl> <dbl>      <dbl>
##  1  21.0     6 160.0 0.04761905
##  2  21.0     6 160.0 0.04761905
##  3  22.8     4 108.0 0.04385965
##  4  21.4     6 258.0 0.04672897
##  5  18.7     8 360.0 0.05347594
##  6  18.1     6 225.0 0.05524862
##  7  14.3     8 360.0 0.06993007
##  8  24.4     4 146.7 0.04098361
##  9  22.8     4 140.8 0.04385965
## 10  19.2     6 167.6 0.05208333
## # ... with 22 more rows
iris %>%
  tbl_df() %>%
  mutate_at(vars(-Species), funs(log))# %>% # vars() funs()
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
##  1     1.629241    1.252763    0.3364722  -1.6094379  setosa
##  2     1.589235    1.098612    0.3364722  -1.6094379  setosa
##  3     1.547563    1.163151    0.2623643  -1.6094379  setosa
##  4     1.526056    1.131402    0.4054651  -1.6094379  setosa
##  5     1.609438    1.280934    0.3364722  -1.6094379  setosa
##  6     1.686399    1.360977    0.5306283  -0.9162907  setosa
##  7     1.526056    1.223775    0.3364722  -1.2039728  setosa
##  8     1.609438    1.223775    0.4054651  -1.6094379  setosa
##  9     1.481605    1.064711    0.3364722  -1.6094379  setosa
## 10     1.589235    1.131402    0.4054651  -2.3025851  setosa
## # ... with 140 more rows

filter

  • filter observations
  • try to use always the form dplyr::filter
iris %>%
  tbl_df() %>%
  # logical criteria
  dplyr::filter(Sepal.Length > 7)
## # A tibble: 12 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
##           <dbl>       <dbl>        <dbl>       <dbl>    <fctr>
##  1          7.1         3.0          5.9         2.1 virginica
##  2          7.6         3.0          6.6         2.1 virginica
##  3          7.3         2.9          6.3         1.8 virginica
##  4          7.2         3.6          6.1         2.5 virginica
##  5          7.7         3.8          6.7         2.2 virginica
##  6          7.7         2.6          6.9         2.3 virginica
##  7          7.7         2.8          6.7         2.0 virginica
##  8          7.2         3.2          6.0         1.8 virginica
##  9          7.2         3.0          5.8         1.6 virginica
## 10          7.4         2.8          6.1         1.9 virginica
## 11          7.9         3.8          6.4         2.0 virginica
## 12          7.7         3.0          6.1         2.3 virginica

arrange

  • reorder observations
mtcars %>%
  tbl_df() %>%
  select(1:3) %>% 
  # order rows
  dplyr::arrange(mpg) %>%
  dplyr::arrange(desc(mpg))
## # A tibble: 32 x 3
##      mpg   cyl  disp
##    <dbl> <dbl> <dbl>
##  1  33.9     4  71.1
##  2  32.4     4  78.7
##  3  30.4     4  75.7
##  4  30.4     4  95.1
##  5  27.3     4  79.0
##  6  26.0     4 120.3
##  7  24.4     4 146.7
##  8  22.8     4 108.0
##  9  22.8     4 140.8
## 10  21.5     4 120.1
## # ... with 22 more rows

group_by + summarise

  • group_by() groups observations by categorical levels
  • summarise() summarise observations by functions of choice
iris %>%
  tbl_df() %>%
  # compute separate summary row for each group
  dplyr::group_by(Species) %>%
  summarise(avg= mean(Sepal.Length)) %>%
  dplyr::ungroup()
## # A tibble: 3 x 2
##      Species   avg
##       <fctr> <dbl>
## 1     setosa 5.006
## 2 versicolor 5.936
## 3  virginica 6.588

joins

  • dplyr also does multi-table joins and can connect to various types of databases.
t1 = data_frame(alpha = letters[1:6], num = 1:6)
t2 = data_frame(alpha = letters[4:10], num = 4:10)
full_join(t1, t2, by = "alpha", suffix = c("_t1", "_t2"))
## # A tibble: 10 x 3
##    alpha num_t1 num_t2
##    <chr>  <int>  <int>
##  1     a      1     NA
##  2     b      2     NA
##  3     c      3     NA
##  4     d      4      4
##  5     e      5      5
##  6     f      6      6
##  7     g     NA      7
##  8     h     NA      8
##  9     i     NA      9
## 10     j     NA     10

Super-secret pro-tip: You can group_by %>% mutate to accomplish a summarize + join

data_frame(group = sample(letters[1:3], 10, replace = TRUE),
           value = rnorm(10)) %>%
  group_by(group) %>%
  mutate(group_average = mean(value))

ggplot2

Visualization package

  • Note that the pipe and consistent API make it easy to combine functions from different packages, and the whole thing is quite readable.
# density, cumsum, cume_dist + facet
z <- iris %>%
  tbl_df() %>%
  gather(key=attrib, value= attrib_m, -Species) %>%
  group_by(attrib, Species) %>%
  arrange(attrib, Species, attrib_m) %>%
  dplyr::mutate_if(is.numeric,funs(cumsum, cume_dist))
#dplyr::mutate_each(funs(cumsum, cume_dist), -Species)

b <- z %>%
  ggplot(aes(attrib_m,cumsum)) + 
  geom_line(aes(colour= Species)) +
  facet_grid(. ~ attrib)

c <- iris %>%
  gather(key=attrib, value= attrib_m, -Species) %>%
  ggplot(aes(attrib_m)) + 
  geom_density(aes(colour= Species)) + 
  facet_grid(. ~ attrib)

Rmisc::multiplot(b, c, cols = 1)  

who %>%
  select(-iso2, -iso3) %>%
  gather(group, cases, -country, -year) %>%
  count(country, year, wt = cases) %>%
  ggplot(aes(x = year, y = n, group = country)) +
  geom_line(size = .2) 

ANEXO: Rstudio

¡Sácale el jugo a sus ventajas!

  • Atajos:
    • Ctrl+ Shift+K: knitr
    • Alt+ Shift+K: show all key shortcuts
  • Atajos con Ctrl+
    1. script
    2. console
    3. help
    4. history search
    5. files
    6. plots
    7. packages
    8. environment
    9. Viewer
  • Recuerda el pipe %>%
    • Ctrl+M

stats with broom

# 1. summary stats
iris %>% 
  tbl_df() %>% 
  gather(key=attrib, value= attrib_m, -Species) %>%
  group_by(Species, attrib) %>%
  summarise_if(is.numeric,c("mean", "median", #location
                            "IQR", "mad", "sd", "var")) %>% #spread
  filter(attrib=="Sepal.Length")
  #glimpse()

# 2. distribution visualization
iris %>%
  ggplot(aes(Sepal.Length)) + 
  geom_density(aes(colour= Species))

# 2. test hypothesis
iris %>%
  filter(Species!="setosa") %>%
  t.test(Sepal.Length ~ Species, data=.) %>%
  broom::tidy()

iris %>%
  filter(Species!="versicolor") %>%
  t.test(Sepal.Length ~ Species, data=.) %>%
  broom::tidy()

iris %>%
  filter(Species!="virginica") %>%
  t.test(Sepal.Length ~ Species, data=.) %>%
  broom::tidy()

iris %>%
  #filter(Species!="setosa") %>%
  aov(Sepal.Length ~ Species, data=.) %>%
  broom::tidy()
  broom::glance()
  broom::augment()

MÁS EJEMPLOS

library(tidyverse)
library(stringr)
library(forcats)
library(broom)
#library(EDAWR)

#
tidyr::who %>%
  filter(iso3=="PER") %>% 
  summarise_if(is.numeric,mean, na.rm=T) %>%
  glimpse()

# one -------------------------------------
who1 <- tidyr::who %>%
  gather(new_sp_m014:newrel_f65,
         key= "key",
         value= "cases",
         na.rm=T) %>%
  mutate(key= stringr::str_replace(key, 
                                   "newrel","new_rel")) %>% 
  separate(key, 
           c("new", "type", "sexage"), 
           sep="_") %>% 
  select(-new, -iso2, -iso3) %>% 
  separate(sexage, 
           c("sex", "age"), 
           sep=1) 

who1 %>%
  filter(country=="Peru") %>% 
  mutate(age= forcats::fct_reorder(age, desc(cases))) %>% 
  ggplot(aes(year, cases)) + 
  geom_line(aes(colour=age))
  #count(age)
  #View()

# two -------------------------------------
who2 <- who1 %>%
  group_by(country, year, sex) %>% 
  summarise_at(vars(cases), sum, na.rm=T) 

who2 %>%
  filter(country=="Peru") %>%
  ggplot(aes(year, cases)) + 
  geom_line(aes(colour=sex)) +
  facet_wrap(~ country)

# three -------------------------------------
who3 <- who2 %>% 
  group_by(country) %>% 
  summarise_at(vars(cases), sum, na.rm=T) %>% 
  top_n(20, wt=cases) %>% 
  select(country) %>% 
  inner_join(who1) %>% 
  bind_rows(who1 %>%
              filter(country=="Peru")) 

who3 %>% 
  group_by(country) %>% 
  mutate(age= forcats::fct_reorder(age, desc(cases))) %>%
  ggplot(aes(year, log10(cases))) + 
  geom_line(aes(colour=age)) +
  facet_wrap(~ country)

Computer environment

devtools::session_info()
## Session info ----------------------------------------------------------------------------
##  setting  value                       
##  version  R version 3.4.1 (2017-06-30)
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language en_US                       
##  collate  en_US.UTF-8                 
##  tz       America/Lima                
##  date     2017-08-04
## Packages --------------------------------------------------------------------------------
##  package    * version date       source                        
##  assertthat   0.2.0   2017-04-11 CRAN (R 3.4.0)                
##  backports    1.1.0   2017-05-22 CRAN (R 3.4.1)                
##  base       * 3.4.1   2017-07-08 local                         
##  bindr        0.1     2016-11-13 cran (@0.1)                   
##  bindrcpp   * 0.2     2017-06-17 CRAN (R 3.4.1)                
##  broom        0.4.2   2017-02-13 CRAN (R 3.4.0)                
##  cellranger   1.1.0   2016-07-27 CRAN (R 3.4.0)                
##  colorspace   1.3-2   2016-12-14 CRAN (R 3.4.0)                
##  compiler     3.4.1   2017-07-08 local                         
##  datasets   * 3.4.1   2017-07-08 local                         
##  devtools     1.13.2  2017-06-02 CRAN (R 3.4.1)                
##  digest       0.6.12  2017-01-27 CRAN (R 3.4.0)                
##  dplyr      * 0.7.2   2017-07-20 CRAN (R 3.4.1)                
##  EDAWR      * 0.1     2017-02-24 Github (rstudio/EDAWR@2652ea6)
##  evaluate     0.10.1  2017-06-24 CRAN (R 3.4.1)                
##  forcats      0.2.0   2017-01-23 CRAN (R 3.4.0)                
##  foreign      0.8-69  2017-06-21 CRAN (R 3.4.1)                
##  ggplot2    * 2.2.1   2016-12-30 CRAN (R 3.4.0)                
##  glue         1.1.1   2017-06-21 CRAN (R 3.4.1)                
##  graphics   * 3.4.1   2017-07-08 local                         
##  grDevices  * 3.4.1   2017-07-08 local                         
##  grid         3.4.1   2017-07-08 local                         
##  gtable       0.2.0   2016-02-26 CRAN (R 3.4.0)                
##  haven        1.1.0   2017-07-09 CRAN (R 3.4.1)                
##  hms          0.3     2016-11-22 CRAN (R 3.4.0)                
##  htmltools    0.3.6   2017-04-28 CRAN (R 3.4.0)                
##  httr         1.2.1   2016-07-03 CRAN (R 3.4.0)                
##  jsonlite     1.5     2017-06-01 cran (@1.5)                   
##  knitr        1.16    2017-05-18 cran (@1.16)                  
##  labeling     0.3     2014-08-23 CRAN (R 3.4.0)                
##  lattice      0.20-35 2017-03-25 CRAN (R 3.3.3)                
##  lazyeval     0.2.0   2016-06-12 CRAN (R 3.4.0)                
##  lubridate    1.6.0   2016-09-13 CRAN (R 3.4.0)                
##  magrittr     1.5     2014-11-22 CRAN (R 3.4.0)                
##  memoise      1.1.0   2017-04-21 CRAN (R 3.4.0)                
##  methods    * 3.4.1   2017-07-08 local                         
##  mnormt       1.5-5   2016-10-15 CRAN (R 3.4.0)                
##  modelr       0.1.1   2017-07-24 CRAN (R 3.4.1)                
##  munsell      0.4.3   2016-02-13 CRAN (R 3.4.0)                
##  nlme         3.1-131 2017-02-06 CRAN (R 3.4.0)                
##  parallel     3.4.1   2017-07-08 local                         
##  pkgconfig    2.0.1   2017-03-21 cran (@2.0.1)                 
##  plyr         1.8.4   2016-06-08 CRAN (R 3.4.0)                
##  psych        1.7.5   2017-05-03 CRAN (R 3.4.0)                
##  purrr      * 0.2.2.2 2017-05-11 cran (@0.2.2.2)               
##  R6           2.2.2   2017-06-17 CRAN (R 3.4.1)                
##  Rcpp         0.12.12 2017-07-15 CRAN (R 3.4.1)                
##  readr      * 1.1.1   2017-05-16 CRAN (R 3.4.1)                
##  readxl       1.0.0   2017-04-18 CRAN (R 3.4.0)                
##  reshape2     1.4.2   2016-10-22 CRAN (R 3.4.0)                
##  rlang        0.1.1   2017-05-18 cran (@0.1.1)                 
##  rmarkdown    1.6     2017-06-15 CRAN (R 3.4.1)                
##  Rmisc        1.5     2013-10-22 CRAN (R 3.4.0)                
##  rprojroot    1.2     2017-01-16 CRAN (R 3.4.0)                
##  rvest        0.3.2   2016-06-17 CRAN (R 3.4.0)                
##  scales       0.4.1   2016-11-09 CRAN (R 3.4.0)                
##  stats      * 3.4.1   2017-07-08 local                         
##  stringi      1.1.5   2017-04-07 CRAN (R 3.4.0)                
##  stringr      1.2.0   2017-02-18 CRAN (R 3.4.0)                
##  tibble     * 1.3.3   2017-05-28 cran (@1.3.3)                 
##  tidyr      * 0.6.3   2017-05-15 CRAN (R 3.4.1)                
##  tidyverse  * 1.1.1   2017-01-27 CRAN (R 3.4.0)                
##  tools        3.4.1   2017-07-08 local                         
##  utils      * 3.4.1   2017-07-08 local                         
##  withr        2.0.0   2017-07-28 CRAN (R 3.4.1)                
##  xml2         1.1.1   2017-01-24 CRAN (R 3.4.0)                
##  yaml         2.1.14  2016-11-12 CRAN (R 3.4.0)

References