This session introduces the basic functions in dplyr and other packages within the tidyverse.
You can access the full slideshow used in the 2-Tidyverse narration here.
The dataset called ‘fishtibble.csv’ can be downloaded here.
The dataset called ‘coralreefherbivores.csv’ can be downloaded here.
The dataset called ‘fish_abundance.csv’ can be downloaded here.
Read in the coralreefherbivores.csv dataset (if you haven’t already done so).
The “coralreefherbivores.csv” dataset contains trait information on 93 species of herbivorous coral reef fishes from the Great Barrier Reef, Australia. The first four columns provide taxonomic information (family, genus, species, and a combined genus + species column called “genspe”). The next column “sl” indicates the standard length of the fish. Then, you have three columns with morphological measurements of body depth, snout length, and eye diameter. Finally, you have two columns that indicate maximum body size (categories from XS to XL) and their schooling behavior.
The “fish_abundance.csv” dataset contains reef fish abundance information, which was collected by the Reef Life Survey around Lizard Island, Australia. It includes information on the respective survey ID and its metadata (e.g. site, latitude, longitude, date, and depth). It also includes a row for each observed species on a given visual survey, including taxonomic information (family, genus, species, and “genspe”), as well as the number of individuals observed on that survey (‘total’).
herbivores <- read.csv(file = "data/coralreefherbivores.csv")
herbivores.filter <- herbivores %>% # mean sl and sd sl
filter(family == c("Acanthuridae", "Labridae", "Siganidae")) %>%
group_by(family) %>%
summarize(mean.sl = mean(sl), sd.sl = sd(sl))
head(herbivores.filter)
# A tibble: 3 × 3
family mean.sl sd.sl
<chr> <dbl> <dbl>
1 Acanthuridae 186. 89.7
2 Labridae 269. 73.5
3 Siganidae 168. 54.7
herbivores.size_schooling <- herbivores %>%
mutate(size_school = paste(size, schooling, sep = "."))
head(herbivores.size_schooling)
family genus species gen.spe
1 Acanthuridae Acanthurus achilles Acanthurus.achilles
2 Acanthuridae Acanthurus albipectoralis Acanthurus.albipectoralis
3 Acanthuridae Acanthurus auranticavus Acanthurus.auranticavus
4 Acanthuridae Acanthurus blochii Acanthurus.blochii
5 Acanthuridae Acanthurus dussumieri Acanthurus.dussumieri
6 Acanthuridae Acanthurus fowleri Acanthurus.fowleri
sl bodydepth snoutlength eyediameter size schooling
1 163.6667 0.5543625 0.4877797 0.3507191 S Solitary
2 212.7300 0.4405350 0.4402623 0.2560593 M SmallGroups
3 216.0000 0.4726556 0.5386490 0.2451253 M MediumGroups
4 82.9000 0.5586486 0.4782217 0.3196155 M SmallGroups
5 193.7033 0.5457248 0.5661867 0.2807218 L Solitary
6 266.0000 0.4669521 0.5950563 0.2217376 M Solitary
size_school
1 S.Solitary
2 M.SmallGroups
3 M.MediumGroups
4 M.SmallGroups
5 L.Solitary
6 M.Solitary
herbivores.size_schooling2 <- unite(herbivores, col = "size_and_schooling", c("size", "schooling"), sep = ".")
head(herbivores.size_schooling2)
family genus species gen.spe
1 Acanthuridae Acanthurus achilles Acanthurus.achilles
2 Acanthuridae Acanthurus albipectoralis Acanthurus.albipectoralis
3 Acanthuridae Acanthurus auranticavus Acanthurus.auranticavus
4 Acanthuridae Acanthurus blochii Acanthurus.blochii
5 Acanthuridae Acanthurus dussumieri Acanthurus.dussumieri
6 Acanthuridae Acanthurus fowleri Acanthurus.fowleri
sl bodydepth snoutlength eyediameter size_and_schooling
1 163.6667 0.5543625 0.4877797 0.3507191 S.Solitary
2 212.7300 0.4405350 0.4402623 0.2560593 M.SmallGroups
3 216.0000 0.4726556 0.5386490 0.2451253 M.MediumGroups
4 82.9000 0.5586486 0.4782217 0.3196155 M.SmallGroups
5 193.7033 0.5457248 0.5661867 0.2807218 L.Solitary
6 266.0000 0.4669521 0.5950563 0.2217376 M.Solitary
#alternative solution
zebrasoma.scarus <- herbivores %>%
filter(genus == c("Zebrasoma", "Scarus")) %>%
group_by(genus) %>%
summarize(mean.eye = mean(eyediameter))
zebrasoma.scarus <- herbivores %>%
filter(genus %in% c("Zebrasoma", "Scarus")) %>%
group_by(genus) %>%
summarize(mean.eye = mean(eyediameter))
number.species <- herbivores %>% # solution 1
select(size) %>%
group_by(size) %>%
count(size)
number.species
# A tibble: 5 × 2
# Groups: size [5]
size n
<chr> <int>
1 L 18
2 M 43
3 S 28
4 XL 5
5 XS 2
crh.d <- herbivores %>%
group_by(size) %>%
summarize(n_species = n_distinct(gen.spe))
crh.d
# A tibble: 5 × 2
size n_species
<chr> <int>
1 L 18
2 M 43
3 S 28
4 XL 5
5 XS 2
ratio <- herbivores %>%
group_by(species) %>%
mutate(ratio.snout.sl = (snoutlength/sl)) %>%
ungroup() %>%
group_by(genus) %>%
summarize(mean.ratio = mean(ratio.snout.sl)) %>%
arrange(desc(mean.ratio))
head(ratio)
# A tibble: 6 × 2
genus mean.ratio
<chr> <dbl>
1 Zebrasoma 0.00519
2 Paracanthurus 0.00509
3 Ctenochaetus 0.00492
4 Acanthurus 0.00320
5 Siganus 0.00239
6 Naso 0.00216
fish.abu <- read.csv(file = "data/fish_abundance.csv")
# how many surveys
surveys <- fish.abu %>%
select(surveyid) %>%
distinct() # distinct gives you the number of unique occurrences
nrow(surveys)
[1] 62
# replace Scaridae with Labridae
fish.abu.recode <- fish.abu %>%
mutate(family = recode(family, "Scaridae" = "Labridae")) # use recode within mutate to change the family
head(fish.abu.recode)
surveyid country site sitelat sitelong surveydate
1 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
2 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
3 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
4 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
5 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
6 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
depth family genus species block total
1 3 Acanthuridae Acanthurus grammoptilus 2 4
2 3 Acanthuridae Acanthurus nigricauda 2 3
3 3 Acanthuridae Acanthurus olivaceus 2 2
4 3 Acanthuridae Ctenochaetus binotatus 1 15
5 3 Acanthuridae Ctenochaetus binotatus 2 10
6 3 Acanthuridae Zebrasoma scopas 1 2
genspe
1 Acanthurus.grammoptilus
2 Acanthurus.nigricauda
3 Acanthurus.olivaceus
4 Ctenochaetus.binotatus
5 Ctenochaetus.binotatus
6 Zebrasoma.scopas
# filter out families
fish.abu.filtered <- fish.abu.recode %>%
filter(family %in% c("Acanthuridae", "Siganidae", "Labridae", "Kyphosidae")) # use filter - also works as family == c("")
head(fish.abu.filtered)
surveyid country site sitelat sitelong surveydate
1 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
2 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
3 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
4 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
5 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
6 4000720 Australia Watsons Bay north -14.66 145.45 6/27/10 14:00
depth family genus species block total
1 3 Acanthuridae Acanthurus grammoptilus 2 4
2 3 Acanthuridae Acanthurus nigricauda 2 3
3 3 Acanthuridae Acanthurus olivaceus 2 2
4 3 Acanthuridae Ctenochaetus binotatus 1 15
5 3 Acanthuridae Ctenochaetus binotatus 2 10
6 3 Acanthuridae Zebrasoma scopas 1 2
genspe
1 Acanthurus.grammoptilus
2 Acanthurus.nigricauda
3 Acanthurus.olivaceus
4 Ctenochaetus.binotatus
5 Ctenochaetus.binotatus
6 Zebrasoma.scopas
# count species in each genus
fish.species.counts <- fish.abu.filtered %>%
group_by(genus) %>% # group by genus
summarize(number.species = n_distinct(species)) # use n_distinct() to count the rows in each groups ;-)
head(fish.species.counts)
# A tibble: 6 × 2
genus number.species
<chr> <int>
1 Acanthurid 1
2 Acanthurus 9
3 Anampses 2
4 Bodianus 3
5 Calotomus 1
6 Cetoscarus 1
# mean abundances and number of species
fish.genus.abundance <- fish.abu.filtered %>%
group_by(genus) %>% # group by family and genus - important because we'll want family in this dataset
summarize(mean.abu = mean(total), number.species = n_distinct(species)) # mean and n_distinct
head(fish.genus.abundance)
# A tibble: 6 × 3
genus mean.abu number.species
<chr> <dbl> <int>
1 Acanthurid 1 1
2 Acanthurus 4.01 9
3 Anampses 3.33 2
4 Bodianus 1.32 3
5 Calotomus 1 1
6 Cetoscarus 1 1
# eye diameter
herbivore.eyes <- herbivores %>%
ungroup() %>%
group_by(schooling) %>% # group by schooling variable
summarize(median.eye = median(eyediameter)) #summarize()
herbivore.eyes
# A tibble: 5 × 2
schooling median.eye
<chr> <dbl>
1 LargeGroups 0.275
2 MediumGroups 0.230
3 Pairs 0.303
4 SmallGroups 0.261
5 Solitary 0.272
herbivore.pruned <- herbivores %>%
select(-sl, -size, -schooling) # use select to remove columns
head(herbivore.pruned)
family genus species gen.spe
1 Acanthuridae Acanthurus achilles Acanthurus.achilles
2 Acanthuridae Acanthurus albipectoralis Acanthurus.albipectoralis
3 Acanthuridae Acanthurus auranticavus Acanthurus.auranticavus
4 Acanthuridae Acanthurus blochii Acanthurus.blochii
5 Acanthuridae Acanthurus dussumieri Acanthurus.dussumieri
6 Acanthuridae Acanthurus fowleri Acanthurus.fowleri
bodydepth snoutlength eyediameter
1 0.5543625 0.4877797 0.3507191
2 0.4405350 0.4402623 0.2560593
3 0.4726556 0.5386490 0.2451253
4 0.5586486 0.4782217 0.3196155
5 0.5457248 0.5661867 0.2807218
6 0.4669521 0.5950563 0.2217376
# mean body depth, snout length, and eye diameter
herbivore.means <- herbivore.pruned %>%
gather(5:7, key = "metric", value = "value") %>% # gather thre three morphometric columns into one
group_by(genus, metric) %>% # group by
summarize(mean.val = mean(value)) # get mean
head(herbivore.means)
# A tibble: 6 × 3
# Groups: genus [2]
genus metric mean.val
<chr> <chr> <dbl>
1 Acanthurus bodydepth 0.508
2 Acanthurus eyediameter 0.298
3 Acanthurus snoutlength 0.498
4 Bolbometopon bodydepth 0.437
5 Bolbometopon eyediameter 0.233
6 Bolbometopon snoutlength 0.312
# turn traits back into separate columns
herbivore.spread <- herbivore.means %>%
spread(key = "metric", value = "mean.val") # use spread function
herbivore.spread
# A tibble: 14 × 4
# Groups: genus [14]
genus bodydepth eyediameter snoutlength
<chr> <dbl> <dbl> <dbl>
1 Acanthurus 0.508 0.298 0.498
2 Bolbometopon 0.437 0.233 0.312
3 Calotomus 0.385 0.246 0.306
4 Cetoscarus 0.387 0.139 0.428
5 Chlorurus 0.400 0.187 0.366
6 Ctenochaetus 0.532 0.295 0.535
7 Hipposcarus 0.404 0.150 0.386
8 Kyphosus 0.479 0.181 0.164
9 Leptoscarus 0.334 0.209 0.297
10 Naso 0.386 0.274 0.547
11 Paracanthurus 0.490 0.295 0.533
12 Scarus 0.391 0.195 0.341
13 Siganus 0.443 0.329 0.352
14 Zebrasoma 0.575 0.305 0.657
herbivores.joined <- herbivore.spread %>%
left_join(fish.genus.abundance, by = "genus") %>% # use left_join to retain the ones in the traits dataset
drop_na(mean.abu) # remove NA values
herbivores.joined
# A tibble: 11 × 6
# Groups: genus [11]
genus bodydepth eyediameter snoutlength mean.abu number.species
<chr> <dbl> <dbl> <dbl> <dbl> <int>
1 Acanthur… 0.508 0.298 0.498 4.01 9
2 Calotomus 0.385 0.246 0.306 1 1
3 Cetoscar… 0.387 0.139 0.428 1 1
4 Chlorurus 0.400 0.187 0.366 5.39 2
5 Ctenocha… 0.532 0.295 0.535 5.58 3
6 Kyphosus 0.479 0.181 0.164 1.5 1
7 Naso 0.386 0.274 0.547 1.54 6
8 Paracant… 0.490 0.295 0.533 2 1
9 Scarus 0.391 0.195 0.341 3.42 15
10 Siganus 0.443 0.329 0.352 2.21 11
11 Zebrasoma 0.575 0.305 0.657 2.80 2
herbivores.joined.2 <- herbivores.joined %>%
mutate(highvslow = case_when(mean.abu >=2 ~ "high", # use case_when function
TRUE ~ "low"))
herbivores.joined.2
# A tibble: 11 × 7
# Groups: genus [11]
genus bodydepth eyediameter snoutlength mean.abu number.species
<chr> <dbl> <dbl> <dbl> <dbl> <int>
1 Acanthur… 0.508 0.298 0.498 4.01 9
2 Calotomus 0.385 0.246 0.306 1 1
3 Cetoscar… 0.387 0.139 0.428 1 1
4 Chlorurus 0.400 0.187 0.366 5.39 2
5 Ctenocha… 0.532 0.295 0.535 5.58 3
6 Kyphosus 0.479 0.181 0.164 1.5 1
7 Naso 0.386 0.274 0.547 1.54 6
8 Paracant… 0.490 0.295 0.533 2 1
9 Scarus 0.391 0.195 0.341 3.42 15
10 Siganus 0.443 0.329 0.352 2.21 11
11 Zebrasoma 0.575 0.305 0.657 2.80 2
# ℹ 1 more variable: highvslow <chr>
herbivores.highlow <- herbivores.joined.2 %>%
gather(2:5, key = "metric", value = "value") %>% # gathering to make it more efficient
group_by(metric, highvslow) %>% # grouping
summarize(mean.val = mean(value), # summarizing for mean and sd
sd.val = sd(value))
herbivores.highlow
# A tibble: 8 × 4
# Groups: metric [4]
metric highvslow mean.val sd.val
<chr> <chr> <dbl> <dbl>
1 bodydepth high 0.477 0.0688
2 bodydepth low 0.409 0.0466
3 eyediameter high 0.272 0.0567
4 eyediameter low 0.210 0.0613
5 mean.abu high 3.63 1.44
6 mean.abu low 1.26 0.299
7 snoutlength high 0.469 0.119
8 snoutlength low 0.361 0.164
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. Source code is available at https://github.com/simonjbrandl/marinecommunityecology, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".