Chapter 20 - Vectors
20.3.5 Exercises
1. Describe the difference between is.finite(x) and !is.infinite(x).
is.finite(x) should only evaluate to TRUE if the value is not NA, NaN, or +/-Inf. However, !is.infinite(x) will evaluate to TRUE if the value is NA, NaN, or a double/integer.
is.finite(0)
## [1] TRUE
is.finite(NA)
## [1] FALSE
!is.infinite(NA)
## [1] TRUE
2. Read the source code for dplyr::near() (Hint: to see the source code, drop the ()). How does it work?
The source code is below:
# function (x, y, tol = .Machine$double.eps^0.5)
# {
# abs(x - y) < tol
# }
# <bytecode: 0x10a60a198>
# <environment: namespace:dplyr>
Based on the code, the function subtracts the input (x) with the number that you want to compare it to (y), takes the absolute value of that operation, and then checks to see if it is below a certain threshold (tol). If so, it returns TRUE. If not, it returns FALSE. You can toggle the size of the threshold to your liking by changing the tol parameter.
3. A logical vector can take 3 possible values. How many possible values can an integer vector take? How many possible values can a double take? Use google to do some research.
The maximum possible values for integer and double values is related to the bit-representation of each type. For integers, there are 2^32 possible values since R uses 32-bit representation for integers. For doubles, R uses 64-bit representation so there would be 2^64 possible values.
4. Brainstorm at least four functions that allow you to convert a double to an integer. How do they differ? Be precise.
A double could be converted to an integer by rounding either up or down (floor() or ceiling()). In the case of a tie value (doubles ending in .5) we could either round up or down, or towards the either the even or odd digit.
5. What functions from the readr package allow you to turn a string into logical, integer, and double vector?
Respectively, the functions parse_logical(), parse_integer(), parse_double() will turn a string into a logical, integer, or double.
library(readr)
parse_logical(c("TRUE", "FALSE"))
## [1] TRUE FALSE
parse_integer(c("100", "200"))
## [1] 100 200
parse_double(c("100.3", "200"))
## [1] 100.3 200.0
20.4.6 Exercises
1. What does mean(is.na(x)) tell you about a vector x? What about sum(!is.finite(x))?
mean(is.na(x)) tells you what proportion of the values in the vector x are NA. sum(!is.finite(x)) tells you how many values in the vector are NA (total count), because NA is not a finite value.
x <- c(NA, 1, 2, NA, 5:10, NA, NA, NA)
is.na(x)
## [1] TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [12] TRUE TRUE
mean(is.na(x))
## [1] 0.3846154
is.finite(x)
## [1] FALSE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
## [12] FALSE FALSE
!is.finite(x)
## [1] TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [12] TRUE TRUE
sum(is.finite(x))
## [1] 8
sum(!is.finite(x))
## [1] 5
2. Carefully read the documentation of is.vector(). What does it actually test for? Why does is.atomic() not agree with the definition of atomic vectors above?
is.vector() tests for whether a vector is of the specified mode and has no attributes other than names. For example, the named vector ‘x’ below will return TRUE. See examples of use below. The definition above states that atomic vectors are homogenous, in which each value of the vector should be of the same type. One way that is.atomic() deviates from this definition of atomic vectors is that it still returns TRUE for named vectors, which can have character-based names for numerical values.
x <- c(a = 1, b = 2)
is.vector(x)
## [1] TRUE
is.atomic(x)
## [1] TRUE
x <- c(a = 1, b = "hello")
is.vector(x, mode = "integer")
## [1] FALSE
is.vector(x, mode = "character")
## [1] TRUE
is.atomic(x)
## [1] TRUE
3. Compare and contrast setNames() with purrr::set_names().
purrr::set_names() is a more flexible version of stats::setNames() that has more features. In the example below, setNames fails to work when the “names” are not explicitly provided as one vector of the same length as the vector to be named. purrr::set_names() still works when the names are provided separately.
# using stats::setNames()
setNames(1:4, c("a", "b", "c", "d"))
## a b c d
## 1 2 3 4
#setNames(1:4, "a", "b", "c", "d") # Error in setNames(1:4, "a", "b", "c", "d") :unused arguments ("b", "c", "d")
# using purrr::setNames()
library(purrr)
set_names(1:4, c("a", "b", "c", "d"))
## a b c d
## 1 2 3 4
set_names(1:4, "a", "b", "c", "d")
## a b c d
## 1 2 3 4
4. Create functions that take a vector as input and returns:
- The last value. Should you use [ or [[?
# we should use [, instead of [[]]
example <- letters[1:10]
return_last <- function (x) {
return (x[length(x)])
}
example
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
return_last(example)
## [1] "j"
- The elements at even numbered positions.
example <- letters[1:10]
return_even <- function (x) {
even_indicies <- c(1:length(x)) %% 2 == 0
return (x[even_indicies])
}
example
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
return_even(example)
## [1] "b" "d" "f" "h" "j"
- Every element except the last value.
example <- letters[1:10]
remove_last <- function (x) {
return(x[-length(x)])
}
example
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
remove_last(example)
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i"
- Only even numbers (and no missing values).
example <- c(1:5, NA, 6:12, NA, 13:20)
return_even <- function (x) {
return (x[x %% 2 == 0 & !is.na(x)])
}
example
## [1] 1 2 3 4 5 NA 6 7 8 9 10 11 12 NA 13 14 15 16 17 18 19 20
return_even(example)
## [1] 2 4 6 8 10 12 14 16 18 20
5. Why is x[-which(x > 0)] not the same as x[x <= 0]?
which(x > 0) returns a vector of indicies in x which contain values that are greater than zero. x[-which(x > 0)] selects the values in x which do not correspond to those indicies. x <= 0 returns a vector of boolean values corresponding to the values in x which are less than or equal to zero, and x[x <= 0] selects the values in x which satisfy the boolean condition.
x <- c(-5:5)
which(x > 0)
## [1] 7 8 9 10 11
x <= 0
## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
x[-which(x > 0)]
## [1] -5 -4 -3 -2 -1 0
x[x <= 0]
## [1] -5 -4 -3 -2 -1 0
6. What happens when you subset with a positive integer that’s bigger than the length of the vector? What happens when you subset with a name that doesn’t exist?
Subsetting with a positive integer that’s bigger than the length of the vector returns NA. When you subset with a name that doesn’t exist, it returns an error saying
x <- c(-5:5)
# length(x) is 11
x[12]
## [1] NA
# add names to x, then try a name that doesn't exist
names(x) <- letters[1:length(x)]
x
## a b c d e f g h i j k
## -5 -4 -3 -2 -1 0 1 2 3 4 5
# x[l] # Error: object 'l' not found
20.5.4 Exercises
1. Draw the following lists as nested sets:
- list(a, b, list(c, d), list(e, f))
The structure is as follows: [ a, b, [c,d], [e,f] ]
- list(list(list(list(list(list(a))))))
The structure is as follows (the value “a” is nested within 6 lists): [
[
[
[ [ [ a ] ] ] ] ] ]
2. What happens if you subset a tibble as if you’re subsetting a list? What are the key differences between a list and a tibble?
Subsetting a tibble using the names of the columns will pull out the respective columns of the tibble as a new tibble, if multiple columns are selected. If only one column of the tibble is selected, the column is pulled out as the data type of the values stored in it. The same applies to a list, which returns a new list if multiple named constituents are selected, or a vector if one named constituent is selected. A key difference is that a tibble has a fixed dimension and each column must be of the same length, whereas a list can contain vectors of differing lengths. A tibble can also be manipulated using dplyr commands and functions that apply to data frames, which provides more functionality/flexibility for data analysis.
iris
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5.0 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## 11 5.4 3.7 1.5 0.2 setosa
## 12 4.8 3.4 1.6 0.2 setosa
## 13 4.8 3.0 1.4 0.1 setosa
## 14 4.3 3.0 1.1 0.1 setosa
## 15 5.8 4.0 1.2 0.2 setosa
## 16 5.7 4.4 1.5 0.4 setosa
## 17 5.4 3.9 1.3 0.4 setosa
## 18 5.1 3.5 1.4 0.3 setosa
## 19 5.7 3.8 1.7 0.3 setosa
## 20 5.1 3.8 1.5 0.3 setosa
## 21 5.4 3.4 1.7 0.2 setosa
## 22 5.1 3.7 1.5 0.4 setosa
## 23 4.6 3.6 1.0 0.2 setosa
## 24 5.1 3.3 1.7 0.5 setosa
## 25 4.8 3.4 1.9 0.2 setosa
## 26 5.0 3.0 1.6 0.2 setosa
## 27 5.0 3.4 1.6 0.4 setosa
## 28 5.2 3.5 1.5 0.2 setosa
## 29 5.2 3.4 1.4 0.2 setosa
## 30 4.7 3.2 1.6 0.2 setosa
## 31 4.8 3.1 1.6 0.2 setosa
## 32 5.4 3.4 1.5 0.4 setosa
## 33 5.2 4.1 1.5 0.1 setosa
## 34 5.5 4.2 1.4 0.2 setosa
## 35 4.9 3.1 1.5 0.2 setosa
## 36 5.0 3.2 1.2 0.2 setosa
## 37 5.5 3.5 1.3 0.2 setosa
## 38 4.9 3.6 1.4 0.1 setosa
## 39 4.4 3.0 1.3 0.2 setosa
## 40 5.1 3.4 1.5 0.2 setosa
## 41 5.0 3.5 1.3 0.3 setosa
## 42 4.5 2.3 1.3 0.3 setosa
## 43 4.4 3.2 1.3 0.2 setosa
## 44 5.0 3.5 1.6 0.6 setosa
## 45 5.1 3.8 1.9 0.4 setosa
## 46 4.8 3.0 1.4 0.3 setosa
## 47 5.1 3.8 1.6 0.2 setosa
## 48 4.6 3.2 1.4 0.2 setosa
## 49 5.3 3.7 1.5 0.2 setosa
## 50 5.0 3.3 1.4 0.2 setosa
## 51 7.0 3.2 4.7 1.4 versicolor
## 52 6.4 3.2 4.5 1.5 versicolor
## 53 6.9 3.1 4.9 1.5 versicolor
## 54 5.5 2.3 4.0 1.3 versicolor
## 55 6.5 2.8 4.6 1.5 versicolor
## 56 5.7 2.8 4.5 1.3 versicolor
## 57 6.3 3.3 4.7 1.6 versicolor
## 58 4.9 2.4 3.3 1.0 versicolor
## 59 6.6 2.9 4.6 1.3 versicolor
## 60 5.2 2.7 3.9 1.4 versicolor
## 61 5.0 2.0 3.5 1.0 versicolor
## 62 5.9 3.0 4.2 1.5 versicolor
## 63 6.0 2.2 4.0 1.0 versicolor
## 64 6.1 2.9 4.7 1.4 versicolor
## 65 5.6 2.9 3.6 1.3 versicolor
## 66 6.7 3.1 4.4 1.4 versicolor
## 67 5.6 3.0 4.5 1.5 versicolor
## 68 5.8 2.7 4.1 1.0 versicolor
## 69 6.2 2.2 4.5 1.5 versicolor
## 70 5.6 2.5 3.9 1.1 versicolor
## 71 5.9 3.2 4.8 1.8 versicolor
## 72 6.1 2.8 4.0 1.3 versicolor
## 73 6.3 2.5 4.9 1.5 versicolor
## 74 6.1 2.8 4.7 1.2 versicolor
## 75 6.4 2.9 4.3 1.3 versicolor
## 76 6.6 3.0 4.4 1.4 versicolor
## 77 6.8 2.8 4.8 1.4 versicolor
## 78 6.7 3.0 5.0 1.7 versicolor
## 79 6.0 2.9 4.5 1.5 versicolor
## 80 5.7 2.6 3.5 1.0 versicolor
## 81 5.5 2.4 3.8 1.1 versicolor
## 82 5.5 2.4 3.7 1.0 versicolor
## 83 5.8 2.7 3.9 1.2 versicolor
## 84 6.0 2.7 5.1 1.6 versicolor
## 85 5.4 3.0 4.5 1.5 versicolor
## 86 6.0 3.4 4.5 1.6 versicolor
## 87 6.7 3.1 4.7 1.5 versicolor
## 88 6.3 2.3 4.4 1.3 versicolor
## 89 5.6 3.0 4.1 1.3 versicolor
## 90 5.5 2.5 4.0 1.3 versicolor
## 91 5.5 2.6 4.4 1.2 versicolor
## 92 6.1 3.0 4.6 1.4 versicolor
## 93 5.8 2.6 4.0 1.2 versicolor
## 94 5.0 2.3 3.3 1.0 versicolor
## 95 5.6 2.7 4.2 1.3 versicolor
## 96 5.7 3.0 4.2 1.2 versicolor
## 97 5.7 2.9 4.2 1.3 versicolor
## 98 6.2 2.9 4.3 1.3 versicolor
## 99 5.1 2.5 3.0 1.1 versicolor
## 100 5.7 2.8 4.1 1.3 versicolor
## 101 6.3 3.3 6.0 2.5 virginica
## 102 5.8 2.7 5.1 1.9 virginica
## 103 7.1 3.0 5.9 2.1 virginica
## 104 6.3 2.9 5.6 1.8 virginica
## 105 6.5 3.0 5.8 2.2 virginica
## 106 7.6 3.0 6.6 2.1 virginica
## 107 4.9 2.5 4.5 1.7 virginica
## 108 7.3 2.9 6.3 1.8 virginica
## 109 6.7 2.5 5.8 1.8 virginica
## 110 7.2 3.6 6.1 2.5 virginica
## 111 6.5 3.2 5.1 2.0 virginica
## 112 6.4 2.7 5.3 1.9 virginica
## 113 6.8 3.0 5.5 2.1 virginica
## 114 5.7 2.5 5.0 2.0 virginica
## 115 5.8 2.8 5.1 2.4 virginica
## 116 6.4 3.2 5.3 2.3 virginica
## 117 6.5 3.0 5.5 1.8 virginica
## 118 7.7 3.8 6.7 2.2 virginica
## 119 7.7 2.6 6.9 2.3 virginica
## 120 6.0 2.2 5.0 1.5 virginica
## 121 6.9 3.2 5.7 2.3 virginica
## 122 5.6 2.8 4.9 2.0 virginica
## 123 7.7 2.8 6.7 2.0 virginica
## 124 6.3 2.7 4.9 1.8 virginica
## 125 6.7 3.3 5.7 2.1 virginica
## 126 7.2 3.2 6.0 1.8 virginica
## 127 6.2 2.8 4.8 1.8 virginica
## 128 6.1 3.0 4.9 1.8 virginica
## 129 6.4 2.8 5.6 2.1 virginica
## 130 7.2 3.0 5.8 1.6 virginica
## 131 7.4 2.8 6.1 1.9 virginica
## 132 7.9 3.8 6.4 2.0 virginica
## 133 6.4 2.8 5.6 2.2 virginica
## 134 6.3 2.8 5.1 1.5 virginica
## 135 6.1 2.6 5.6 1.4 virginica
## 136 7.7 3.0 6.1 2.3 virginica
## 137 6.3 3.4 5.6 2.4 virginica
## 138 6.4 3.1 5.5 1.8 virginica
## 139 6.0 3.0 4.8 1.8 virginica
## 140 6.9 3.1 5.4 2.1 virginica
## 141 6.7 3.1 5.6 2.4 virginica
## 142 6.9 3.1 5.1 2.3 virginica
## 143 5.8 2.7 5.1 1.9 virginica
## 144 6.8 3.2 5.9 2.3 virginica
## 145 6.7 3.3 5.7 2.5 virginica
## 146 6.7 3.0 5.2 2.3 virginica
## 147 6.3 2.5 5.0 1.9 virginica
## 148 6.5 3.0 5.2 2.0 virginica
## 149 6.2 3.4 5.4 2.3 virginica
## 150 5.9 3.0 5.1 1.8 virginica
typeof(iris)
## [1] "list"
typeof(iris[,c("Sepal.Length", "Sepal.Width")])
## [1] "list"
typeof(iris$Sepal.Length)
## [1] "double"
typeof(iris[,c("Sepal.Length")])
## [1] "double"
mylist <- list(nums = c(1:5),
myletters = letters[1:15])
mylist
## $nums
## [1] 1 2 3 4 5
##
## $myletters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"
typeof(mylist)
## [1] "list"
typeof(mylist$nums)
## [1] "integer"
20.7.4 Exercises
1. What does hms::hms(3600) return? How does it print? What primitive type is the augmented vector built on top of? What attributes does it use?
It returns 01:00:00. It is built on top of double. It uses the attributes “class”, which has values “hms” and “difftime”, and “units”, which has the value “secs”.
hms::hms(3600)
## 01:00:00
typeof(hms::hms(3600))
## [1] "double"
attributes(hms::hms(3600))
## $class
## [1] "hms" "difftime"
##
## $units
## [1] "secs"
2. Try and make a tibble that has columns with different lengths. What happens?
Trying to make a tibble with differing column lengths results in an error.
# tibble (a = c(1:5),
# b = letters[1:3])
# Error: Tibble columns must have consistent lengths, only values of length one are recycled: * Length 3: Column `b` * Length 5: Column `a`
However, there is an exception to this, in which the values of length one are repeated until the column length matches the other columns. An example is below:
tibble (a = c(1:5),
b = letters[1])
## # A tibble: 5 x 2
## a b
## <int> <chr>
## 1 1 a
## 2 2 a
## 3 3 a
## 4 4 a
## 5 5 a
3. Based on the definition above, is it ok to have a list as a column of a tibble?
Yes, it is OK to have a list as a column of a tibble, as long as the the length of the list matches the length of the other columns in the tibble. An example of a tibble with a list for one of its columns and how to select a value from that column is below:
mytib <- tibble (a = c(1:3),
b = letters[1:3],
mylist = list(x = c(1:5),
y = c(10:20),
z = c(2:3)))
mytib$mylist[[1]]
## [1] 1 2 3 4 5