Data Analytics (ECMP 5005B)
Esam Mahdi
School of Mathematics and Statistics
Master of Engineering - Engineering Practice
Carleton University
Wednesday, September 6, 2023



By the end of this chapter, you should be able to do the following:
 
 
Source: Robert I. Kabacoff. R in Action: Data analysis and graphics with R and Tidyverse. 2nd ed., Manning, 2022.
Do not trust all of these packages!
One problem that we usually face when we load and attach some libraries in R is that these libraries might have different masked functions share the same namespace. For example, the function lag() is masked by both stats and dplyr packages. It performs a different tasks in both. Thus, you need to be careful if you are using lag() in R while the package dplyr.
set.seed(1)    #set reproducible results 
x <- rnorm(5)  #generate 5 observations from the standard normal distribution N(0,1)
stats::lag(x, 2) #shift the time base back by 2 (keep 1st & 2nd observations)[1] -0.6264538  0.1836433 -0.8356286  1.5952808  0.3295078
attr(,"tsp")
[1] -1  3  1[1]         NA         NA -0.6264538  0.1836433 -0.8356286Let’s start with the following R code
# create two numeric vectors, each with 8 observations
wt <- c(60,70,63,55,48,49,58,58)
age = c(20,17,23,24,19,19,16,26) #note "<-" symbol can be replaced by "="
# get a random sample (without replacement) of 8 observations
set.seed(5) #set seed to reproduce the same random sample
z=sample(150:190, size = 8)
# replicat a string "Male" 3 times & get a vector of characters  
Male=rep("Male", times = 3)
# replicat a string "Female" 5 times & get a vector of characters 
Female=rep("Female", times = 5)
# combine the two categorical variables into one nominal variable
s = c(Male, Female)
# create an ordinal categorical variable 
income=c("Low","High","Low","Low","Middle","Middle","Middle","High")
# stores categorical values as vector of integers (factors)
sex=factor(s)
income=factor(income,order=TRUE,levels=c("Low","Middle","High"))
# create a data frame and name the variables
mydata=data.frame(id=1:8,weight=wt,age=age,z=z,sex=sex,Sex=s,income)'data.frame':   8 obs. of  7 variables:
 $ id    : int  1 2 3 4 5 6 7 8
 $ weight: num  60 70 63 55 48 49 58 58
 $ age   : num  20 17 23 24 19 19 16 26
 $ z     : int  151 164 160 170 179 156 168 152
 $ sex   : Factor w/ 2 levels "Female","Male": 2 2 2 1 1 1 1 1
 $ Sex   : chr  "Male" "Male" "Male" "Female" ...
 $ income: Ord.factor w/ 3 levels "Low"<"Middle"<..: 1 3 1 1 2 2 2 3mat1 <- matrix(wt, nrow = 5, ncol = 2) #create a matrix of dimension 5x2
mat2 <- matrix(age, nrow = 2, ncol = 5) #create a matrix of dimension 2x5
mat3 = cbind(wt,age,sex,income) #combine vectors by columns. Exercise: Type the code: rbind(wt,age,sex,income) and explain the outcome!
mylist <- list(wt,age,sex,income)   #create a list of 4 vectors       id           weight           age              z             sex   
 Min.   :1.00   Min.   :48.00   Min.   :16.00   Min.   :151.0   Female:5  
 1st Qu.:2.75   1st Qu.:53.50   1st Qu.:18.50   1st Qu.:155.0   Male  :3  
 Median :4.50   Median :58.00   Median :19.50   Median :162.0             
 Mean   :4.50   Mean   :57.62   Mean   :20.50   Mean   :162.5             
 3rd Qu.:6.25   3rd Qu.:60.75   3rd Qu.:23.25   3rd Qu.:168.5             
 Max.   :8.00   Max.   :70.00   Max.   :26.00   Max.   :179.0             
     Sex               income 
 Length:8           Low   :3  
 Class :character   Middle:3  
 Mode  :character   High  :2  
                              
                              
                                id weight age   z    sex    Sex income status0
1  1     60  20 151   Male   Male    Low  grade1
2  2     70  17 164   Male   Male   High  grade2
3  3     63  23 160   Male   Male    Low  grade3
4  4     55  24 170 Female Female    Low  grade4
5  5     48  19 179 Female Female Middle  grade5
6  6     49  19 156 Female Female Middle  grade6
7  7     58  16 168 Female Female Middle  grade7
8  8     58  26 152 Female Female   High  grade8  id weight age   z    sex    Sex income status0 status1
1  1     60  20 151   Male   Male    Low  grade1 grade 1
2  2     70  17 164   Male   Male   High  grade2 grade 2
3  3     63  23 160   Male   Male    Low  grade3 grade 3
4  4     55  24 170 Female Female    Low  grade4 grade 4
5  5     48  19 179 Female Female Middle  grade5 grade 5
6  6     49  19 156 Female Female Middle  grade6 grade 6
7  7     58  16 168 Female Female Middle  grade7 grade 7
8  8     58  26 152 Female Female   High  grade8 grade 8# Recoding variables: recode age 20 by a missing value
mydata$age[mydata$age == 20] <- NA 
mydata[1:4,] #display the first 4 rows  id weight age   z    sex    Sex income status0 status1
1  1     60  NA 151   Male   Male    Low  grade1 grade 1
2  2     70  17 164   Male   Male   High  grade2 grade 2
3  3     63  23 160   Male   Male    Low  grade3 grade 3
4  4     55  24 170 Female Female    Low  grade4 grade 4  id weight age    sex    Sex income status0
1  1     60  NA   Male   Male    Low  grade1
2  2     70  17   Male   Male   High  grade2
3  3     63  23   Male   Male    Low  grade3
4  4     55  24 Female Female    Low  grade4
5  5     48  19 Female Female Middle  grade5
6  6     49  19 Female Female Middle  grade6
7  7     58  16 Female Female Middle  grade7
8  8     58  26 Female Female   High  grade8par(mfrow = c(2, 2)) #create a 2 x 2 plotting matrix
plot(wt,age); plot(mydata$weight, mydata$age) #type ?plot to get help about the function plot()
plot(wt,age, xlab = "Weight", ylab = "Age", col = "red")
plot(density(rnorm(500)),col="blue") #plot a density distribution of 500 random data from Gaussian
After setting up R environment with Rstudio, you can import the data from different structures.
> read.table
  function (file, header = FALSE, sep = "", quote = "\"'", dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"), row.names, col.names, as.is = !stringsAsFactors, tryLogical = TRUE, na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE, blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, flush = FALSE, stringsAsFactors = FALSE, fileEncoding = "", encoding = "unknown", text, skipNul = FALSE) > read.csv
  function (file, header = TRUE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "", ...) > read.csv2
  function (file, header = TRUE, sep = ";", quote = "\"", dec = ",", fill = TRUE, comment.char = "", ...) Bitcoin daily price (in US dollars) from January 22, 2020 to September 1, 2021 (during COVID-19).
# change the working directory to a different location on your computer
dat1 <- read.table("data/BTC.txt",header = T, fill=TRUE) 
dat2 <- read.csv("data/BTC.csv", header = TRUE)
str(dat1)'data.frame':   589 obs. of  5 variables:
 $ Date : chr  "1/22/2020" "1/23/2020" "1/24/2020" "1/25/2020" ...
 $ Price: num  8664 8404 8447 8354 8622 ...
 $ Open : num  8734 8669 8404 8447 8351 ...
 $ High : num  8800 8669 8522 8447 8622 ...
 $ Low  : num  8581 8297 8248 8280 8304 ...Use skim() function from skimr package to get a useful summary statistics.

Exercise:
The readr package provides functions to read rectangular data with extension .csv, .txt or .tsv.
# Read from a path specifies the location of a data on your computer
name_data <- read_csv("file_data - Sheet1.csv") # import data from a comma delimited file
# Read from a remote path (e.g., mtcars data set from GitHub website)
name_data <- read_csv("https://github.com/tidyverse/readr/raw/main/inst/extdata/mtcars.csv") 
name_data <- read_tsv("file_data.txt") # import data from a tab delimited file separated by tabs
name_data <- read_tsv("file_data.tsv", sheet=1) # import data from a tab delimited fileThe readxl package can import tabular data from Excel workbooks. Both xls and xlsx formats are supported.
The haven package can import data with .sav, .dat, and .sas7bdat extensions.
#first, make sure you that have the package "haven" installed on you computer,
#if not installed, you need to install it
install.packages("haven") 
library(haven) # Load the package
name_data <- read_sav("file_data.sav")      # import data from SPSS
name_data <- read_dat("file_data.dat")      # import data from Stata
name_data <- read_sas("file_data.sas7bdat") # import data from SAS Import the BTC data set using the functions read_tsv() and read_csv() from the package readr.
library(readr)
dat3 <- read_tsv("data/BTC.txt") 
dat4 <- read_csv("data/BTC.csv")
head(dat3) #same results using head(dat4)# A tibble: 6 × 5
  Date      Price  Open  High   Low
  <chr>     <dbl> <dbl> <dbl> <dbl>
1 1/22/2020 8664. 8734. 8800. 8581.
2 1/23/2020 8404. 8669. 8669. 8297.
3 1/24/2020 8447. 8404. 8522. 8248.
4 1/25/2020 8354. 8447. 8447. 8280 
5 1/26/2020 8622. 8351. 8622  8304.
6 1/27/2020 8912  8622  9002. 8585.Note that the head() prints differently from before because it’s a tibble. Tibbles are rectangular data frames, but slightly tweaked to work better in the tidyverse package that we will discuss later!
The excel sheet BTC2 has two sheets named BTC and BTC2. The data BTC2 is stored in columns G7:G38-K7:38. The first cell A1 provides a quick description of this data. Data has some missing values.
# A tibble: 6 × 5
  Date                Price  Open  High    Low
  <dttm>              <dbl> <dbl> <dbl>  <dbl>
1 2021-01-01 00:00:00 29346 28933 29498 28932 
2 2021-01-02 00:00:00 32185 29346 33168 29192 
3 2021-01-03 00:00:00 32971 32183 34253 32110 
4 2021-01-04 00:00:00    NA    NA    NA    NA 
5 2021-01-05 00:00:00 33996 32020 33996 30979.
6 2021-01-06 00:00:00 36755 33986 36755 33901 Many popular packages, such as readr, tidyr, dplyr, and purr, save data frames as tibbles. When you are using the package tibble to import data be aware of the following properties:
| Function | Use | Syntax | 
|---|---|---|
| mutate() | Transform or recode variables | dataframe <- mutate(dataframe, new_varibles = expression) | 
| select() | Select variables/columns | dataframe <- select(dataframe, select_variables) | 
| filter() | Select observations/rows | dataframe <- filter(dataframe, expression) | 
| rename() | Rename variables/columns | dataframe <- rename(dataframe, new_varaibles_names = old_varaibles_names) | 
| recode() | Recode variable values | variable <- recode(variable, old_values = new_values) | 
| arrange() | Order rows by variable values | dataframe <- arrange(dataframe, sort_varaibles) | 
| group_by() | Group by one or more variables | dataframe <- group_by(varaibles to group by) | 
library(dplyr)
dat5 <- mutate(dat5,BeforeClose=dplyr::lag(Price), returns=log(Price)-log(BeforeClose)) #use the BTC2 dataset
dat5[1:2,]# A tibble: 2 × 7
  Date                Price  Open  High   Low BeforeClose returns
  <dttm>              <dbl> <dbl> <dbl> <dbl>       <dbl>   <dbl>
1 2021-01-01 00:00:00 29346 28933 29498 28932          NA NA     
2 2021-01-02 00:00:00 32185 29346 33168 29192       29346  0.0923dat5 <- dat5 %>% 
  mutate(Date = lubridate::mdy(Date), #parse dates with month, day, and year components using the function mdy() from the "lubridate" package
         BeforeClose=dplyr::lag(Price),
         returns=log(Price)-log(BeforeClose))
dat5[1:2,] #note that the first returns is missing "NA". To remove "NA", use the code: %>% tidyr::drop_na() # A tibble: 2 × 7
  Date   Price  Open  High   Low BeforeClose returns
  <date> <dbl> <dbl> <dbl> <dbl>       <dbl>   <dbl>
1 NA     29346 28933 29498 28932          NA NA     
2 NA     32185 29346 33168 29192       29346  0.0923  id weight age    sex income status0
1  1     60  NA   Male    Low  grade1
2  2     70  17   Male   High  grade2
3  3     63  23   Male    Low  grade3
4  4     55  24 Female    Low  grade4
5  5     48  19 Female Middle  grade5
6  6     49  19 Female Middle  grade6
7  7     58  16 Female Middle  grade7
8  8     58  26 Female   High  grade8# Select all females with age 19 or weight greater than 59
mydata %>% filter(sex == "Female" &
                    age == 19 | weight > 59)   id weight age height    sex    Sex income status0 status1
1  1     60  NA    151   Male   Male    Low  grade1 grade 1
2  2     70  17    164   Male   Male   High  grade2 grade 2
3  3     63  23    160   Male   Male    Low  grade3 grade 3
4  5     48  19    179 Female Female Middle  grade5 grade 5
5  6     49  19    156 Female Female Middle  grade6 grade 6Note that the first age is missing (“NA”). This value is associated with low income. Thus, the average age for those who have low income is missing (“NA”).
Exercise: How do you solve this issue?
mutate_data <- mydata %>% 
  select(id, age, height, weight,gender,income,status0) %>%
  mutate(height_foot = 0.033 * height) %>% 
  rename(status = status0) %>%
  filter(income == c("Low","Middle")) %>%
  arrange(age, income) # "income" is an ordinal variable
mutate_data  id age height weight gender income status height_foot
1  6  19    156     49 Female Middle grade6       5.148
2  3  23    160     63   Male    Low grade3       5.280Exercise:
[dpqr] abbreviation name of  distribution, where each letter of [dpqr] refers to the aspect of the distribution returned:
d = Densityp = Distribution functionq = Quantile functionr = Random generation| Distribution | Syntax | Distribution | Syntax | Distribution | Syntax | |||
|---|---|---|---|---|---|---|---|---|
| Beta | beta() | Binomial | binom() | Cauchy | cauchy() | |||
| Chi-squared | chisq() | Exponential | exp() | F | f() | |||
| Gamma | gamma() | Geometric | geom() | Hypergeometric | hyper() | |||
| Lognormal | lnorm() | Logistic | logis() | Multinomial | multinom() | |||
| Negative binomial | nbinom() | Normal | norm() | Poisson | pois() | |||
| Wilcoxon signed rank | signrank() | T | t() | Uniform | unif() | |||
| Weibull | weibull() | Wilcoxon rank sum | wilcox() | |||||

z = 2.1?
pnorm(2.1) to get 0.9821356.qnorm(0.95, mean =100, sd = 20) to get 132.8971rnorm(300, mean =80, sd = 10) to get the simulated series.The Posit Cheatsheets website suggests some favorite data science packages to use!

> install.packages("tidyverse")
> library(tidyverse)
── Attaching core tidyverse packages ─────────────────────────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ───────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package to force all conflicts to become errors
Warning messages:
1: package ‘tidyverse’ was built under R version 4.3.1 
2: package ‘readr’ was built under R version 4.3.1 | Function | Layers | Options | 
|---|---|---|
| geom_point() | Scatterplot | color, alpha, shape, size | 
| geom_line() | Line graph | colorvalpha, linetype, size | 
| geom_jitter() | Jittered points | color, size, alpha, shape | 
| geom_bar() | Bar chart | color, fill, alpha | 
| geom_boxplot() | Box plot | color, fill, alpha, notch, width | 
| geom_histogram() | Histogram | color, fill, alpha, linetype, binwidth | 
| geom_smooth() | Fitted line | method, formula, color, fill, linetype, size | 
| geom_density() | Density plot | color, fill, alpha, linetype | 
| geom_hline() | Horizontal lines | color, alpha, linetype, size | 
| geom_vline() | Vertical lines | color, alpha, linetype, size | 
| geom_rug() | Rug plot | color, side | 
| geom_violin() | Violin plot | color, fill, alpha, linetype | 
| geom_text() | Text annotations | see the help for this function | 
 Source: https://nbisweden.github.io/RaukR-2019/ggplot/presentation/ggplot_presentation.html#1.
  Source: https://nbisweden.github.io/RaukR-2019/ggplot/presentation/ggplot_presentation.html#1. 
 See also https://clauswilke.com/dataviz/directory-of-visualizations.html
Let’s use our first graph to answer the following questions about the mpg data frame available from the package ggplot2:
# A tibble: 6 × 11
  manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
  <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
6 audi         a4      2.8  1999     6 manual(m5) f        18    26 p     compa…The first argument is the dataset that you need to use in the plot. The result of this code is an empty graph (default theme used by ggplot2 is theme_gray()).

Now you can add one or more layers to ggplot(). The function geom_point() adds a layer of points (scatterplot) to your plot.
The plot shows a negative relationship between the car engine size (in liters) and the car’s fuel efficiency on the highway (in miles per gallon). The bigger the size of the engine, the less efficient it is in consuming fuel.

The aes() (stands for aesthetics) function is used to map variables to the visual characteristics of a plot.


Mapping class to the size aesthetic display more clear information about the (outliers) in this data.

The visualization of the plot is not clear. We can fine tune the appearance of the graph using themes and improved visualization.

Here, I will use the theme theme_bw() (for black and white).
Exercise: Why are the points not blue?

Note: Instead of using the character name of the color “blue”, you can use the “#0000FF” hex code of this color.
Exercise: The points are blue now! Why?

Scale functions (which start with scale_) allow you to modify default scaling provided by ggplot2
| Function | Syntax | 
|---|---|
| scale_x_continuous() | Scales the x-axis for quantitative variables. Options include breaks for specifying tick marks, labels for specifying tick mark labels, and limits to control the range of the values displayed | 
| scale_y_continuous() | Same as above for y-axis | 
| scale_x_discrete() | Same as above for x-axis representing categorical variable | 
| scale_y_discrete() | Same as above for y-axis representing categorical variable | 
| scale_color_manual() | Specifies the colors (with option values) used to represent the levels of a categorical variable | 
facet_wrap() and facet_grid() are used to partition a plot into a matrix of panels (side-by-side graphs), particularly useful for categorical variables.
| Function | Syntax | 
|---|---|
| facet_wrap(~var, nrow = r) | Partition plots for each level of variable (var) arranged into r rows | 
| facet_wrap(~var, ncol = c) | Partition plots for each level of variable (var) arranged into c columns | 
| facet_grid(row_var~col_var) | Partition plots for combination of rows variable (row_var) and columns variable (col_var) | 
| facet_grid(rows = row_var) | Partition plots for for each level of rows variable (row_var), arranged as a single column | 
| facet_grid(cols = col_var) | Partition plots for for each level of columns variable (col_var), arranged as a single row | 

Note: The default argument scales = “fixed” is used if x and y scales are fixed across all panels; scales = “free_x” if x scale is free and y scale is fixed; scales = “free_y” if y scale is free and x scale is fixed; and scales = “free” if x and y scales vary across panels.

+ theme(legend.position = "right") # the default+ theme(legend.position = "left")+ theme(legend.position = "top")+ theme(legend.position = "bottom")suv <- mpg %>% filter(class == "suv")
p <- ggplot(suv, aes(displ, hwy, color = drv)) +
  geom_point(size = 4) + theme_bw()
p + labs(title = "Fuel economy data",
       subtitle = "Suv cars",
       x = "Engine displacement, in litres",
       y = "Highway miles per gallon",
       color = "Type of drive train") +
  scale_color_manual(labels = c("4wd", "Rear wheel drive"), 
                     values = c("blue", "red")) +
  theme(legend.position="bottom", 
        legend.key.size = unit(1.4, "cm"),
        legend.key.height=unit(0.5, "cm"),
        legend.key = element_rect(fill = "gray90", color = "red"),
        text=element_text(family="serif")) 

Or even you can create your own theme. The source code of the following theme theme_bluewhite() can be found from the link https://www.datanovia.com/en/blog/ggplot-themes-gallery/

Helps visualize whether a distribution of a data set is symmetric or skewed due to unusual observations (outliers). The grapgh displays the five numbers summary (minimum, maximum, median, first and third quartiles).
# Load "tidyquant" and "tidyverse" packages 
library(tidyquant)
library(tidyverse)
# Get daily stock prices of Apple from the web in a tibble format
Apple <- tq_get("AAPL",from="2010-01-04",
                to="2018-12-31",get="stock.prices")
# mutate returns series named as "ret"
Ap <- Apple %>% 
  mutate(Date = ymd(date), 
         Beforeclose = dplyr::lag(close),
         ret = log(close) - log(Beforeclose)) %>%
  drop_na(ret) #remove "NA"# Plot log-returns series
P1 <- ggplot(Ap)+
  geom_line(aes(x=Date,y=ret),color="gray30")+
  labs(y="Log Returns", x="") +
  scale_x_date(date_labels="%Y %b", 
               date_breaks="12 months") +
  theme_bw()
# Plot histogram
P2 <- ggplot(Ap)+ 
  geom_histogram(aes(ret),binwidth=0.004,
                 col="gray30",fill="gray80")+
  annotate("text",x=c(-0.1,-0.1),y=c(70,60),
           label=c("Skewness:-0.1738",
                   "Ex.kurtosis:3.5783"),
           color=c("gray30","gray30"))+
           labs(y="", x="Log Returns") +
  theme_bw()
# Load "gridExtra" package
library(gridExtra)
# Place the two plots on one page
grid.arrange(P2, P1, nrow=1, 
   top="Apple, Inc. stock price from 
    January 04, 2010 to December 31, 2018")?mtcars
?volcanoTo plot the surface, use the following command: 
 plot_ly(z=~volcano) %>% add_surface()
library(ggiraph)
p <- ggplot(iris,aes(x=Sepal.Length,
                     y=Petal.Length, colour=Species))+
  geom_point_interactive(aes(tooltip=
                             paste0("<b>Petal Length:</b>",
                                Petal.Length,"\n<b>Sepal Length:</b>",
                                Sepal.Length,"\n<b>Species:</b>",
                                Species)),size=1)+ 
  theme_bw()
tooltip_css <- "background-color:#f8f9f9;
                padding:10px;
                border-style:solid;
                border-width:2px;
                border-color:#125687;
                border-radius:5px;"
ggiraph(code=print(p),
        hover_css="cursor:pointer;
                   stroke:black;
                   fill-opacity:0.3",
        zoom_max=5,
        tooltip_extra_css=tooltip_css,
        tooltip_opacity=0.9,
        height_svg=4,width_svg=4,
        width=1)R package highcharter is a wrapper around javascript library highcharts.
library(highcharter)
p <- iris %>%
  hchart("scatter",
         hcaes(x="Sepal.Length",
               y="Sepal.Width",group="Species")) %>%
  hc_xAxis(title=list(text="Sepal Length"),
           crosshair=TRUE) %>%
  hc_yAxis(title=list(text="Sepal Width"),
           crosshair=TRUE) %>%
  hc_chart(zoomType="xy",inverted=FALSE) %>%
  hc_legend(verticalAlign="top",align="right") %>% 
  hc_size(height=500,width=500)
htmltools::tagList(list(p))Consider the gapminder data set on life expectancy, GDP per capita, and population by country.
library(gganimate)
library(gapminder)
p <- ggplot(gapminder,
       aes(x=gdpPercap, 
           y=lifeExp, 
           size=pop,
           color=country)) + 
  geom_point(show.legend=F,
             alpha=0.7) + 
  scale_color_viridis_d() + 
  scale_size(range=c(2, 12)) + 
  scale_x_log10()+ 
  theme_bw() + 
  labs(x="GDP per capita",
      y="Life expectancy")
p + 
   transition_time(year) + 
   labs(title="Year: {frame_time}")
Consider the same previous data set in the previous slide. Here, we use the package gapminder to compare by continents.
p <- ggplot(gapminder,
            aes(x=gdpPercap, 
                y=lifeExp, 
                size=pop,
                color=country)) + 
  geom_point(show.legend=F,
             alpha=0.7) + 
  scale_color_viridis_d() + 
  scale_size(range=c(2, 12)) + 
  scale_x_log10()+ 
  theme_bw() + 
  labs(x="GDP per capita",
       y="Life expectancy") + 
  facet_wrap(~continent)
p + 
  transition_time(year) + 
  labs(title="Year: {frame_time}")
The package networkD3 allows the use of interactive network graphs from the D3.js javascript library.
The package leaflet provides R bindings for javascript mapping library; leafletjs.
R package crosstalk allows crosstalk enabled plotting libraries to be linked. Through the shared key variable, data points can be manipulated simultaneously on two independent plots.
invisible(lapply(c("crosstalk","htmltools"), library, character.only = TRUE))
shared_quakes <- SharedData$new(quakes[sample(nrow(quakes), 100),])
lf <- leaflet(shared_quakes,height=300) %>%
        addTiles(urlTemplate='http://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png') %>% addMarkers()
py <- plot_ly(shared_quakes,x=~depth,y=~mag,size=~stations,height=300) %>% add_markers()
div(div(lf,style="float:left;width:45%"),div(py,style="float:right;width:45%"))R Markdown is a powerful tool to write up a good-looking report by combining R code chunks, analysis, and reporting into the same document.
This document is prepared by R Markdown.
© Esam Mahdi (2023)