This tutorial explains the difference between tibble() and data.frame(), along with several examples.
1. When you print a tibble, it doesn't show all the data. Whereas, data frame prints the complete data.
Note - tibble is a part of the tibble package. When you load dplyr package, it is also loaded.
# Load library library(dplyr) df <- data.frame(x = 1:50, y = seq(1,100,2)) tb <- tibble(x = 1:50, y = seq(1,100,2)) # Print print(df) print(tb)
As shown in the output below, tibble displayed only top 10 rows.
# A tibble: 50 × 2 x y 1 1 1 2 2 3 3 3 5 4 4 7 5 5 9 6 6 11 7 7 13 8 8 15 9 9 17 10 10 19 # ℹ 40 more rows # ℹ Use `print(n = ...)` to see more rows
2. data.frame() returns values of a column when you use a partial column name to access it. Whereas, tibble() returns error - Unknown or uninitialised column
.
df <- data.frame(ids = 1:50, score = seq(1,100,2)) tb <- tibble(ids = 1:50, score = seq(1,100,2))
In the example below, we are using "id" instead of "ids" to access the column.
tb$id # NULL # Warning message: # Unknown or uninitialised column: `id`.
3. A tibble remains a tibble when you extract a single column from it, whereas a data frame becomes a vector when you select a single column from it.
df <- data.frame(ids = 1:50, score = seq(1,100,2)) tb <- tibble(ids = 1:50, score = seq(1,100,2)) tb2 <- tb[,"score"] df2 <- df[,"score"]
In the code below, we are checking if the new dataset is still a tibble or data.frame using is_tibble()
and is.data.frame()
functions.
is_tibble(tb2) # TRUE is.data.frame(df2) # FALSE
4. When adding a new column to a tibble, the number of rows must match the number of rows in the other columns. Whereas, data frame adds a column even when the length of the new column is different than the others.
tb$newcol <- c(5,6)
As shown in the code below, values in data.frame() gets repeated when the length of the new column does not match.
df$newcol <- c(5,6) # [1] 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 6 5 # [38] 6 5 6 5 6 5 6 5 6 5 6 5 6
Share Share Tweet