Marine Community Ecology 2024

class: center, middle, inverse, title-slide

.title[
# Marine Community Ecology 2024
]
.subtitle[
## 02-Entering the tidyverse
]
.author[
### Simon J. Brandl
]
.institute[
### The University of Texas at Austin
]
.date[
### 2024/01/01 (updated: 2024-01-28)
]

---

background-image: url("images/IMG_2100.jpg")
background-size: cover
class: center, top, inverse

# Data wrangling using dplyr and the tidyverse

.scrollable {
  height: 300px;
  overflow-y: auto;
}

.scrollable-auto {
  height: 75%;
  overflow-y: auto;
}

.remark-slide-scaler {
 overflow-y: auto;
}
</style>

---
# The tidyverse 💫

- the tidyverse contains a vast number of functions to process data

- concept by Hadley Wickham: intuitive, simple way to do data science

---
## What is tidy data? 🧹

- data that is easy to transform, visualize, and model

- variables are always columns, rows are always data

- functions are meant to be intuitive

.pull-right[
<img src="images/tidy_concept.png" width="80%" />
]
---
## The dplyr/tidyverse package 🔧

.pull-left[
- the core package for tidy data processing is the **dplyr** package

- the **dplyr** package is part of the **tidyverse** package, which includes several other packages

```r
library(dplyr)
```

```
## 
## Attaching package: 'dplyr'
```

```
## The following objects are masked from 'package:stats':
## 
##     filter, lag
```

```
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
```

```r
library(tidyverse)
```

```
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
```

```
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
```
]

.pull-right[
<img src="images/hex-dplyr.jpeg" width="75%" />
]

---

class: center, middle

## Five workhorse functions 🐴

**1) filter(): keep/remove rows based on criteria**

**2) select(): keep/remove columns by name/number/sequence**

**3) mutate(): add new variables**

**4) summarize(): reduce variables to summarized values**

**5) arrange(): reorder rows**

## ✂️ 🔨 🧲 🗜 🔩
---

## Pipes

.pull-left[
- pipes are a special type operator, implemented using **%>%**

- pipes allow you to construct a sequence of actions with the same dataset

- for example, we can create a vector and take its mean

```r
v1 <- rnorm(1000, 0, 5) %>% # creating a vector using rnorm() and piping it
 mean() # taking the mean
v1
```

```
## [1] -0.20858
```
]

.pull-right[
<img src="images/lotr_pipe.jpg" width="80%" />
]
---
## Preparation

- we'll start by reading in our fish.tibble that we created previously

- the .csv file should be in your data directory - if not, you can download it [here](https://simonjbrandl.github.io/marinecommunityecology/2-tidyverse.html)

```r
fish.tibble <- read.csv(file = "data/fishtibble.csv") 
fish.tibble
```

```
##   bluefish blowfish yellowfish         location
## 1        3        7          4        Australia
## 2        7        2          7        Indonesia
## 3        5        8          1      Philippines
## 4        5        4          6             Fiji
## 5        3        4          6         Solomons
## 6        6        3          9 Papua New Guinea
```
---
class: center, middle, inverse
# Filter
# 🧲
---
### Basic filtering

- The *filter()* function let's you select or remove rows based on characters or values

```r
fish.tbl.filtered <- fish.tibble %>% # create a new object from fish.tibble and pipe it
 filter(location == "Australia") # filter rows for Australia
fish.tbl.filtered
```

```
##   bluefish blowfish yellowfish  location
## 1        3        7          4 Australia
```

```r
fish.tbl.filtered2 <- fish.tibble %>%
 filter(location == c("Australia", "Indonesia")) # use multiple criteria using c()
fish.tbl.filtered2
```

```
##   bluefish blowfish yellowfish  location
## 1        3        7          4 Australia
## 2        7        2          7 Indonesia
```

```r
fish.tbl.filtered3 <- fish.tibble %>%
 filter(blowfish > 3) # filter by values greater than 5 in the blowfish column
fish.tbl.filtered3
```

```
##   bluefish blowfish yellowfish    location
## 1        3        7          4   Australia
## 2        5        8          1 Philippines
## 3        5        4          6        Fiji
## 4        3        4          6    Solomons
```
---
### Advanced filtering

- we can apply the same logical expressions we learned previously

```r
fish.tbl.filtered4 <- fish.tibble %>%
 filter(yellowfish > 3 & yellowfish < 7) # filter by values greater than three and smaller than 8
fish.tbl.filtered4
```

```
##   bluefish blowfish yellowfish  location
## 1        3        7          4 Australia
## 2        5        4          6      Fiji
## 3        3        4          6  Solomons
```

```r
# filter across multiple columns, numeric, character, and c()
fish.tbl.filtered5 <- fish.tibble %>%
 filter(bluefish > 5 & location != c("Fiji", "Australia")) 
fish.tbl.filtered5
```

```
##   bluefish blowfish yellowfish         location
## 1        7        2          7        Indonesia
## 2        6        3          9 Papua New Guinea
```
---
class: center, middle, inverse
# Select
# ☑️
---
### Basic selecting (for or against)

- the *select()* function allows you to keep columns in your dataset based on names, positions, or criteria

- by using a - sign, you can de-select columns from your dataset

```r
fish.tbl.select <- fish.tibble %>% # retain the columns blowfish and location
 select(blowfish, location)
fish.tbl.select
```

```
##   blowfish         location
## 1        7        Australia
## 2        2        Indonesia
## 3        8      Philippines
## 4        4             Fiji
## 5        4         Solomons
## 6        3 Papua New Guinea
```

```r
fish.tbl.select2 <- fish.tibble %>%
 select(-bluefish, -blowfish) # remove the bluefish and blowfish columns
fish.tbl.select2
```

```
##   yellowfish         location
## 1          4        Australia
## 2          7        Indonesia
## 3          1      Philippines
## 4          6             Fiji
## 5          6         Solomons
## 6          9 Papua New Guinea
```
---
### Selecting using positions or criteria

- you can also choose based on column positions or employ criteria for negative or positive selection

```r
fish.tbl.select3 <- fish.tibble %>% # retain only the third and fourth column
 select(3:4)
fish.tbl.select3
```

```
##   yellowfish         location
## 1          4        Australia
## 2          7        Indonesia
## 3          1      Philippines
## 4          6             Fiji
## 5          6         Solomons
## 6          9 Papua New Guinea
```

```r
fish.tbl.select4 <- fish.tibble %>%
 select(-ends_with("fish")) # remove all columns that end with fish
fish.tbl.select4
```

```
##           location
## 1        Australia
## 2        Indonesia
## 3      Philippines
## 4             Fiji
## 5         Solomons
## 6 Papua New Guinea
```
---
class: center, middle, inverse
# Mutate
# 🐛  🦋
---
### Using mutate to create new columns

- the *mutate()* function basically creates new columns

- most commonly, we'll use *mutate()* to create columns based on existing columns

- in the simplest scenario, we can create columns from scratch equivalent to the *add_column()* function

```r
fish.tbl.mutate <- fish.tibble %>%
 mutate(greyfish = c(0,2,4,6,8,10), # add another numeric column called greyfish 
 type = c("continental", # and a categorical variable called type
 "continental", 
 "continental", 
 "oceanic", 
 "oceanic", 
 "continental"))
fish.tbl.mutate
```

```
## bluefish blowfish yellowfish location greyfish type
## 1 3 7 4 Australia 0 continental
## 2 7 2 7 Indonesia 2 continental
## 3 5 8 1 Philippines 4 continental
## 4 5 4 6 Fiji 6 oceanic
## 5 3 4 6 Solomons 8 oceanic
## 6 6 3 9 Papua New Guinea 10 continental
```
- hint: you can use the *relocate()* function to tidy up your dataset

---
### Using mutate on existing columns

- we can use *mutate()* for basic mathematical operations or combining columns

```r
fish.tbl.mutate2 <- fish.tbl.mutate %>%
 mutate(totalfish = bluefish+blowfish+yellowfish+greyfish) %>% # sum acrcoss fish species 
 relocate(location, type) # relocate for tidyness
fish.tbl.mutate2
```

```r
fish.tbl.mutate3 <- fish.tbl.mutate2 %>%
 mutate(loc_type = paste(location, type, sep = ".")) # combine the two character columns
fish.tbl.mutate3
```

```
## location type bluefish blowfish yellowfish greyfish totalfish
## 1 Australia continental 3 7 4 0 14
## 2 Indonesia continental 7 2 7 2 18
## 3 Philippines continental 5 8 1 4 18
## 4 Fiji oceanic 5 4 6 6 21
## 5 Solomons oceanic 3 4 6 8 21
## 6 Papua New Guinea continental 6 3 9 10 28
## loc_type
## 1 Australia.continental
## 2 Indonesia.continental
## 3 Philippines.continental
## 4 Fiji.oceanic
## 5 Solomons.oceanic
## 6 Papua New Guinea.continental
```
- hint: check out the *unite()* function
---
### Using mutate to replace and transform

- you can use mutate to replace character strings or transform numbers

```r
fish.tbl.mutate4 <- fish.tbl.mutate3 %>%
 mutate(type.recode = recode(type, continental = "coastal")) # use recode() within mutate to replace characters
fish.tbl.mutate4
```

```
##           location        type bluefish blowfish yellowfish greyfish totalfish
## 1        Australia continental        3        7          4        0        14
## 2        Indonesia continental        7        2          7        2        18
## 3      Philippines continental        5        8          1        4        18
## 4             Fiji     oceanic        5        4          6        6        21
## 5         Solomons     oceanic        3        4          6        8        21
## 6 Papua New Guinea continental        6        3          9       10        28
##                       loc_type type.recode
## 1        Australia.continental     coastal
## 2        Indonesia.continental     coastal
## 3      Philippines.continental     coastal
## 4                 Fiji.oceanic     oceanic
## 5             Solomons.oceanic     oceanic
## 6 Papua New Guinea.continental     coastal
```

```r
new.fish.tbl <- fish.tbl.mutate3 %>%
 mutate(log_totalfish = log(totalfish)) # create a column with the log of totalfish
new.fish.tbl
```

```
##           location        type bluefish blowfish yellowfish greyfish totalfish
## 1        Australia continental        3        7          4        0        14
## 2        Indonesia continental        7        2          7        2        18
## 3      Philippines continental        5        8          1        4        18
## 4             Fiji     oceanic        5        4          6        6        21
## 5         Solomons     oceanic        3        4          6        8        21
## 6 Papua New Guinea continental        6        3          9       10        28
##                       loc_type log_totalfish
## 1        Australia.continental      2.639057
## 2        Indonesia.continental      2.890372
## 3      Philippines.continental      2.890372
## 4                 Fiji.oceanic      3.044522
## 5             Solomons.oceanic      3.044522
## 6 Papua New Guinea.continental      3.332205
```
---
class: center, middle, inverse
# Summarize
# 📝
---
### Summarizing across rows

- the *summarize()* function turns many row values into one by performing some kind of mathematical operation

```r
sum.blowfish <- new.fish.tbl %>% 
 summarize(mean.blowfish = mean(blowfish), # get the mean, sd, min and max
 sd.blowfish = sd(blowfish), 
 min.blowfish = min(blowfish),
 max.blowfish = max(blowfish))
sum.blowfish
```

```
##   mean.blowfish sd.blowfish min.blowfish max.blowfish
## 1      4.666667     2.33809            2            8
```

```r
sum.blueblow <- new.fish.tbl %>%
 summarize(mean.blowfish = mean(blowfish), # means for two columns
 mean.bluefish = mean(bluefish))
sum.blueblow
```

```
##   mean.blowfish mean.bluefish
## 1      4.666667      4.833333
```

```r
range.total <- new.fish.tbl %>%
 summarize(range.total = range(totalfish), # range and quantiles for totalfish
 quant.total = quantile(totalfish, c(0.05, 0.95)))
range.total
```

```
##   range.total quant.total
## 1          14       15.00
## 2          28       26.25
```

---
class: middle, center, inverse
# Arrange
# 🔀
---
### Arranging columns

- the arrange() function takes the place of the sort function from base R

- NAs will always go to the bottom of the column

```r
fish.tbl.order <- new.fish.tbl %>%
 arrange(type) # arrange by type
fish.tbl.order
```

```
##           location        type bluefish blowfish yellowfish greyfish totalfish
## 1        Australia continental        3        7          4        0        14
## 2        Indonesia continental        7        2          7        2        18
## 3      Philippines continental        5        8          1        4        18
## 4 Papua New Guinea continental        6        3          9       10        28
## 5             Fiji     oceanic        5        4          6        6        21
## 6         Solomons     oceanic        3        4          6        8        21
##                       loc_type log_totalfish
## 1        Australia.continental      2.639057
## 2        Indonesia.continental      2.890372
## 3      Philippines.continental      2.890372
## 4 Papua New Guinea.continental      3.332205
## 5                 Fiji.oceanic      3.044522
## 6             Solomons.oceanic      3.044522
```

```r
fish.tbl.order.total <- new.fish.tbl %>%
 arrange(-totalfish) # arrange by totalfish, descending
fish.tbl.order.total
```

```
##           location        type bluefish blowfish yellowfish greyfish totalfish
## 1 Papua New Guinea continental        6        3          9       10        28
## 2             Fiji     oceanic        5        4          6        6        21
## 3         Solomons     oceanic        3        4          6        8        21
## 4        Indonesia continental        7        2          7        2        18
## 5      Philippines continental        5        8          1        4        18
## 6        Australia continental        3        7          4        0        14
##                       loc_type log_totalfish
## 1 Papua New Guinea.continental      3.332205
## 2                 Fiji.oceanic      3.044522
## 3             Solomons.oceanic      3.044522
## 4        Indonesia.continental      2.890372
## 5      Philippines.continental      2.890372
## 6        Australia.continental      2.639057
```

```r
fish.tbl.order.NA <- new.fish.tbl %>%
 mutate(na.fish = c(1, 2, 3, NA, 5, 6)) %>% # create column with NAs
 arrange(na.fish) %>% # arrange data by na.fish (ascending)
 relocate(na.fish)
fish.tbl.order.NA
```

```
##   na.fish         location        type bluefish blowfish yellowfish greyfish
## 1       1        Australia continental        3        7          4        0
## 2       2        Indonesia continental        7        2          7        2
## 3       3      Philippines continental        5        8          1        4
## 4       5         Solomons     oceanic        3        4          6        8
## 5       6 Papua New Guinea continental        6        3          9       10
## 6      NA             Fiji     oceanic        5        4          6        6
##   totalfish                     loc_type log_totalfish
## 1        14        Australia.continental      2.639057
## 2        18        Indonesia.continental      2.890372
## 3        18      Philippines.continental      2.890372
## 4        21             Solomons.oceanic      3.044522
## 5        28 Papua New Guinea.continental      3.332205
## 6        21                 Fiji.oceanic      3.044522
```
---
class: inverse, center, top

# Exercise 2.1 🏋️‍♀️

### Read in your fishtibble.csv file and perform the following:

### a) Remove the values for the Philippines

### b) Retain only the first two columns

### c) Create a new column that contains the ratio of bluefish to blowfish

### d) Obtain the variance in bluefish, blowfish, and yellowfish

### e) Sort your dataset in the reverse alphabetical order of locations

---
class: center, top
# Solution 2.1a 🤓

## a) Remove the values for the Philippines

```r
fish.tibble <- read.csv(file = "data/fishtibble.csv")
a <- fish.tibble %>%
 filter(location != "Philippines")
a
```

```
##   bluefish blowfish yellowfish         location
## 1        3        7          4        Australia
## 2        7        2          7        Indonesia
## 3        5        4          6             Fiji
## 4        3        4          6         Solomons
## 5        6        3          9 Papua New Guinea
```
---
class: center, top
# Solution 2.1b 🤓

## b) Retain only the first two columns

```r
b <- fish.tibble %>%
 select(1:2)
b
```

```
##   bluefish blowfish
## 1        3        7
## 2        7        2
## 3        5        8
## 4        5        4
## 5        3        4
## 6        6        3
```
---
class: center, top
# Solution 2.1c 🤓

## c) Create a new column that contains the ratio of bluefish to blowfish

```r
c <- fish.tibble %>%
 mutate(ratio = bluefish/blowfish)
c
```

```
##   bluefish blowfish yellowfish         location     ratio
## 1        3        7          4        Australia 0.4285714
## 2        7        2          7        Indonesia 3.5000000
## 3        5        8          1      Philippines 0.6250000
## 4        5        4          6             Fiji 1.2500000
## 5        3        4          6         Solomons 0.7500000
## 6        6        3          9 Papua New Guinea 2.0000000
```
---
class: center, top
# Solution 2.1d 🤓

## d) Obtain the variance in bluefish, blowfish, and yellowfish

```r
d <- fish.tibble %>%
 summarize(var_blue = var(bluefish),
 var_blow = var(blowfish),
 var_yell = var(yellowfish))
d
```

```
##   var_blue var_blow var_yell
## 1 2.566667 5.466667      7.5
```
---
class: center, top
# Solution 2.1e 🤓

## e) Sort your dataset in the reverse alphabetical order of locations

```r
e <- fish.tibble %>%
 arrange(desc(location))
e
```

```
## bluefish blowfish yellowfish location
## 1 3 4 6 Solomons
## 2 5 8 1 Philippines
## 3 6 3 9 Papua New Guinea
## 4 7 2 7 Indonesia
## 5 5 4 6 Fiji
## 6 3 7 4 Australia
```
---
class: center
<img src="images/mariekondo.png" width="60%" />
---
class: center, middle

## Auxilliary functions 🔨

**1) group_by(): perform actions across rows with the same factor level**

**2) join(): combine datasets based on an overlapping column**

**3) gather(): compile rows from many columns into a single column**

**4) spread(): distribute rows from a single column into many columns**

**5) case_when(): apply advanced conditional logic to your mutate statements**

---
class: center, middle, inverse
# group_by()
# 👥 
---
### Grouping rows by factor levels

- the group_by() creates an internal group structure

- groupings are now indicated for tibbles

```r
fish.tbl.grouped <- new.fish.tbl %>%
 group_by(type) # group by type
fish.tbl.grouped # dataset looks the same, but it will behave differently due to grouping
```

```
## # A tibble: 6 × 9
## # Groups: type [2]
## location type bluefish blowfish yellowfish greyfish totalfish loc_type
## <chr> <chr> <int> <int> <int> <dbl> <dbl> <chr> 
## 1 Australia cont… 3 7 4 0 14 Austral…
## 2 Indonesia cont… 7 2 7 2 18 Indones…
## 3 Philippines cont… 5 8 1 4 18 Philipp…
## 4 Fiji ocea… 5 4 6 6 21 Fiji.oc…
## 5 Solomons ocea… 3 4 6 8 21 Solomon…
## 6 Papua New Guin… cont… 6 3 9 10 28 Papua N…
## # ℹ 1 more variable: log_totalfish <dbl>
```

```r
fish.tbl.sum <- fish.tbl.grouped %>%
 summarize(mean.fish <- mean(totalfish)) # add summarize() to see new behavior
fish.tbl.sum
```

```
## # A tibble: 2 × 2
## type `mean.fish <- mean(totalfish)`
## <chr> <dbl>
## 1 continental 19.5
## 2 oceanic 21
```
---
### Advanced grouping and ungrouping

- we can group by multiple arguments

- grouping creates a legacy that can mess things up downstream, which we can resolve using ungroup()

```r
fish.tbl.sum2 <- new.fish.tbl %>%
 mutate(region = c("Oceania", "Asia", "Asia", "Oceania", "Asia", "Asia")) %>% # create another variable
 group_by(region, type) %>% # group by type and region
 summarize(mean.fish <- mean(totalfish)) # add summarize() to see new behavior
```

```
## `summarise()` has grouped output by 'region'. You can override using the
## `.groups` argument.
```

```r
fish.tbl.sum2
```

```
## # A tibble: 4 × 3
## # Groups: region [2]
## region type `mean.fish <- mean(totalfish)`
## <chr> <chr> <dbl>
## 1 Asia continental 21.3
## 2 Asia oceanic 21 
## 3 Oceania continental 14 
## 4 Oceania oceanic 21
```

```r
fish.tbl.sum3 <- fish.tbl.grouped %>%
 ungroup() %>% # remove grouping structure
 summarize(mean.fish <- mean(totalfish)) 
fish.tbl.sum3
```

```
## # A tibble: 1 × 1
## `mean.fish <- mean(totalfish)`
## <dbl>
## 1 20
```
---
class: center, middle, inverse
# Join
# 🤝 
---
### Joining datasets

- there are four ways of using the function

1) left_join(): retains all elements on the left side of the equation

2) right_join(): retains all elements on the right side of the join equation

3) inner_join(): only joins elements that match

4) full_join(): retains everything

- to explore these functions, let's get some additional data

- the **wpp2019** package includes a dataset called "pop" with global population sizes by country

```r
library(wpp2019) # load the package
data(pop) 
str(pop)
```

```
## 'data.frame':	249 obs. of  17 variables:
##  $ country_code: int  900 947 1833 921 1832 1830 927 1835 1829 903 ...
##  $ name        : chr  "World" "Sub-Saharan Africa" "Northern Africa and Western Asia" "Central and Southern Asia" ...
##  $ 1950        : num  2536431 179007 100239 510788 842669 ...
##  $ 1955        : num  2773020 197490 113425 558666 932210 ...
##  $ 1960        : num  3034950 220138 129302 619068 1019895 ...
##  $ 1965        : num  3339584 247831 147822 691687 1127782 ...
##  $ 1970        : num  3700437 280908 168730 775437 1280853 ...
##  $ 1975        : num  4079480 321201 192351 870180 1432114 ...
##  $ 1980        : num  4458003 369614 220224 980359 1555768 ...
##  $ 1985        : num  4870922 425841 253469 1105791 1684698 ...
##  $ 1990        : num  5327231 490605 288060 1239984 1837799 ...
##  $ 1995        : num  5744213 560759 323178 1376200 1950220 ...
##  $ 2000        : num  6143494 639661 355882 1511915 2044789 ...
##  $ 2005        : num  6541907 729733 391986 1647074 2125348 ...
##  $ 2010        : num  6956824 836364 435367 1775361 2201807 ...
##  $ 2015        : num  7379797 958577 481520 1896327 2279490 ...
##  $ 2020        : num  7794799 1094366 525869 2014709 2346709 ...
```
---
### Joining in practice

- we are trying to include population sizes from 2020 to countries in our fish.tibble

```r
pop.2020 <- pop %>% 
 select(name, "2020") %>% # select the name column and the 2020 column
 rename(location = "name", # use the rename() function to match the name in our fish.tibble
 population = "2020") # rename 2020 to 'population' - numbers in columns are a bad idea
head(pop.2020)
```

```
##                           location population
## 1                            World  7794798.7
## 2               Sub-Saharan Africa  1094365.6
## 3 Northern Africa and Western Asia   525869.3
## 4        Central and Southern Asia  2014708.5
## 5   Eastern and South-Eastern Asia  2346709.5
## 6  Latin America and the Caribbean   653962.3
```

```r
fish.tibble.left.join <- fish.tibble %>% 
 left_join(pop.2020, by = "location") # use left_join() to merge pop.2020 into the fish.tibble
fish.tibble.left.join # Solomons does not exist in the pop dataset, so it gives "NA" for population size
```

```
##   bluefish blowfish yellowfish         location population
## 1        3        7          4        Australia  25499.881
## 2        7        2          7        Indonesia 273523.621
## 3        5        8          1      Philippines 109581.085
## 4        5        4          6             Fiji    896.444
## 5        3        4          6         Solomons         NA
## 6        6        3          9 Papua New Guinea   8947.027
```

```r
fish.tibble.inner.join <- fish.tibble %>%
 inner_join(pop.2020) # use inner_join() to join pop and fish.tibble datasets, only joining locations that match
```

```
## Joining with `by = join_by(location)`
```

```r
fish.tibble.inner.join
```

```
##   bluefish blowfish yellowfish         location population
## 1        3        7          4        Australia  25499.881
## 2        7        2          7        Indonesia 273523.621
## 3        5        8          1      Philippines 109581.085
## 4        5        4          6             Fiji    896.444
## 5        6        3          9 Papua New Guinea   8947.027
```
---
class: center, middle, inverse
# Gather
# 🧺
---
### Gathering rows from multiple columns

- the gather() function turns data from wide format into long format

- this is extremely useful, as it allows us to use group_by() for our newly created variable

```r
fish.gathered <- fish.tibble.inner.join %>%
 gather(1:3, # specify the columns that include the data frame
 key = "fish_species", value = "number") # provide names of new key and value columns 
head(fish.gathered)
```

```
##           location population fish_species number
## 1        Australia  25499.881     bluefish      3
## 2        Indonesia 273523.621     bluefish      7
## 3      Philippines 109581.085     bluefish      5
## 4             Fiji    896.444     bluefish      5
## 5 Papua New Guinea   8947.027     bluefish      6
## 6        Australia  25499.881     blowfish      7
```

```r
fish.means <- fish.gathered %>%
 group_by(fish_species) %>% # this is the newly created, gathered variable
 summarize(mean.fish = mean(number),
 sd.fish = sd(number))
fish.means
```

```
## # A tibble: 3 × 3
## fish_species mean.fish sd.fish
## <chr> <dbl> <dbl>
## 1 blowfish 4.8 2.59
## 2 bluefish 5.2 1.48
## 3 yellowfish 5.4 3.05
```
---
class: center, middle, inverse
# Spread
# 🥯
---
### Spreading rows into columns

- the spread() function does the inverse of gather()

- problems can arise when there are missing observations

```r
fish.spread <- fish.gathered %>%
 spread(key = fish_species, value = number) # convert the data back into a wide format
fish.spread
```

```
##           location population blowfish bluefish yellowfish
## 1        Australia  25499.881        7        3          4
## 2             Fiji    896.444        4        5          6
## 3        Indonesia 273523.621        2        7          7
## 4 Papua New Guinea   8947.027        3        6          9
## 5      Philippines 109581.085        8        5          1
```

```r
fish.spread2 <- fish.gathered %>%
 filter(number != 3) %>% # let's remove all rows that have the value 3
 spread(key = fish_species, value = number, fill = 0) # fill them with 0s
fish.spread2
```

```
## location population blowfish bluefish yellowfish
## 1 Australia 25499.881 7 0 4
## 2 Fiji 896.444 4 5 6
## 3 Indonesia 273523.621 2 7 7
## 4 Papua New Guinea 8947.027 0 6 9
## 5 Philippines 109581.085 8 5 1
```
---
class: center, middle, inverse
# case_when
# 🧐
---
### Advanced logic within mutate()

- the case_when() function lets you apply logic within mutate()

- this is _extremely_ useful, but can take a while to get the hang of

```r
fish.spread.case <- fish.spread %>%
 mutate(pop.cat = case_when(population > 10000 ~ "high", # high or low 
 TRUE ~ "low"))

fish.spread.case
```

```
##           location population blowfish bluefish yellowfish pop.cat
## 1        Australia  25499.881        7        3          4    high
## 2             Fiji    896.444        4        5          6     low
## 3        Indonesia 273523.621        2        7          7    high
## 4 Papua New Guinea   8947.027        3        6          9     low
## 5      Philippines 109581.085        8        5          1    high
```
---
background-image: url("images/slineatus_2.jpg")
background-size: cover
class: left, top

### Create the following vector:

```r
families <- data.frame("Families" = as.character(c("Acanthuridae", "Kyphosidae", "Labridae", "Siganidae")),
 "Common" = as.character(c("surgeonfishes", "chubs", "parrotfishes", "rabbitfishes")))
```
---
class: inverse, center
# Exercise 2.2 🏋️‍♀️

### a) Read in the 'coralreefherbivores.csv' dataset and obtain the mean bodydeph across  families

### b) Integrate the common names for each family into the dataset

### c) Compile the values for sl, bodydepth, snoutlength, and eyediameter into a single column called "measurement", with a variable called "category" as the key

### d) Reverse the previous action

### e) Create a new column called "googly_eyed" where all species that have an eyediameter >=0.3 are tagged as "googly" and those with eyediameters <0.3 as "notgoogly"
---
class: center, top
# Solution 2.2a 🤓

### a) Obtain the mean bodydeph across different families

```r
herbs <- read.csv(file = "data/coralreefherbivores.csv")
a <- herbs %>%
 group_by(family) %>%
 summarize(mean.bd = mean(bodydepth))
head(a)
```

```
## # A tibble: 4 × 2
## family mean.bd
## <chr> <dbl>
## 1 Acanthuridae 0.487
## 2 Kyphosidae 0.479
## 3 Labridae 0.392
## 4 Siganidae 0.443
```
---
class: center, top
# Solution 2.2b 🤓

### b) Integrate the common names for each family into the dataset

```r
b <- families %>%
 rename(family = "Families") %>%
 inner_join(herbs)
```

```
## Joining with `by = join_by(family)`
```

```r
head(b)
```

```
##         family        Common      genus        species
## 1 Acanthuridae surgeonfishes Acanthurus       achilles
## 2 Acanthuridae surgeonfishes Acanthurus albipectoralis
## 3 Acanthuridae surgeonfishes Acanthurus   auranticavus
## 4 Acanthuridae surgeonfishes Acanthurus        blochii
## 5 Acanthuridae surgeonfishes Acanthurus     dussumieri
## 6 Acanthuridae surgeonfishes Acanthurus        fowleri
##                     gen.spe       sl bodydepth snoutlength eyediameter size
## 1       Acanthurus.achilles 163.6667 0.5543625   0.4877797   0.3507191    S
## 2 Acanthurus.albipectoralis 212.7300 0.4405350   0.4402623   0.2560593    M
## 3   Acanthurus.auranticavus 216.0000 0.4726556   0.5386490   0.2451253    M
## 4        Acanthurus.blochii  82.9000 0.5586486   0.4782217   0.3196155    M
## 5     Acanthurus.dussumieri 193.7033 0.5457248   0.5661867   0.2807218    L
## 6        Acanthurus.fowleri 266.0000 0.4669521   0.5950563   0.2217376    M
##      schooling
## 1     Solitary
## 2  SmallGroups
## 3 MediumGroups
## 4  SmallGroups
## 5     Solitary
## 6     Solitary
```
---
class: center, top
# Solution 2.2c 🤓

### c) Compile the values for sl, bodydepth, snoutlength, and eyediameter into a single column called "measurement", with a variable called "category" as the key

```r
c <- herbs %>%
 gather(5:8, key = "category", value = "measurement")
head(c)
```

```
##         family      genus        species                   gen.spe size
## 1 Acanthuridae Acanthurus       achilles       Acanthurus.achilles    S
## 2 Acanthuridae Acanthurus albipectoralis Acanthurus.albipectoralis    M
## 3 Acanthuridae Acanthurus   auranticavus   Acanthurus.auranticavus    M
## 4 Acanthuridae Acanthurus        blochii        Acanthurus.blochii    M
## 5 Acanthuridae Acanthurus     dussumieri     Acanthurus.dussumieri    L
## 6 Acanthuridae Acanthurus        fowleri        Acanthurus.fowleri    M
##      schooling category measurement
## 1     Solitary       sl    163.6667
## 2  SmallGroups       sl    212.7300
## 3 MediumGroups       sl    216.0000
## 4  SmallGroups       sl     82.9000
## 5     Solitary       sl    193.7033
## 6     Solitary       sl    266.0000
```
---
class: center, top
# Solution 2.2d 🤓

### d) Reverse the previous action

```r
d <- c %>%
 spread(key = "category", value = "measurement")
head(d)
```

```
##         family      genus        species                   gen.spe size
## 1 Acanthuridae Acanthurus       achilles       Acanthurus.achilles    S
## 2 Acanthuridae Acanthurus albipectoralis Acanthurus.albipectoralis    M
## 3 Acanthuridae Acanthurus   auranticavus   Acanthurus.auranticavus    M
## 4 Acanthuridae Acanthurus        blochii        Acanthurus.blochii    M
## 5 Acanthuridae Acanthurus     dussumieri     Acanthurus.dussumieri    L
## 6 Acanthuridae Acanthurus        fowleri        Acanthurus.fowleri    M
##      schooling bodydepth eyediameter       sl snoutlength
## 1     Solitary 0.5543625   0.3507191 163.6667   0.4877797
## 2  SmallGroups 0.4405350   0.2560593 212.7300   0.4402623
## 3 MediumGroups 0.4726556   0.2451253 216.0000   0.5386490
## 4  SmallGroups 0.5586486   0.3196155  82.9000   0.4782217
## 5     Solitary 0.5457248   0.2807218 193.7033   0.5661867
## 6     Solitary 0.4669521   0.2217376 266.0000   0.5950563
```
---
class: center, top
# Solution 2.2e 🤓

### e) Create a new column called "googly_eyed" based on eyediameter

```r
e <- herbs %>%
 mutate(googly_eyed = case_when(eyediameter >= 0.3 ~ "googly",
 TRUE ~ "notgoogly"))
head(e)
```

```
##         family      genus        species                   gen.spe       sl
## 1 Acanthuridae Acanthurus       achilles       Acanthurus.achilles 163.6667
## 2 Acanthuridae Acanthurus albipectoralis Acanthurus.albipectoralis 212.7300
## 3 Acanthuridae Acanthurus   auranticavus   Acanthurus.auranticavus 216.0000
## 4 Acanthuridae Acanthurus        blochii        Acanthurus.blochii  82.9000
## 5 Acanthuridae Acanthurus     dussumieri     Acanthurus.dussumieri 193.7033
## 6 Acanthuridae Acanthurus        fowleri        Acanthurus.fowleri 266.0000
##   bodydepth snoutlength eyediameter size    schooling googly_eyed
## 1 0.5543625   0.4877797   0.3507191    S     Solitary      googly
## 2 0.4405350   0.4402623   0.2560593    M  SmallGroups   notgoogly
## 3 0.4726556   0.5386490   0.2451253    M MediumGroups   notgoogly
## 4 0.5586486   0.4782217   0.3196155    M  SmallGroups      googly
## 5 0.5457248   0.5661867   0.2807218    L     Solitary   notgoogly
## 6 0.4669521   0.5950563   0.2217376    M     Solitary   notgoogly
```
---
background-image: url("images/ggplot_hive.jpg")
background-size: cover
class: center, top, inverseclass: inverse, center, top
---
class: center, middle
# The end