This tutorial demonstrates various ways to remove rows with missing (NA) values in R, along with several examples. NA refers to missing values. NA stands for "Not Available".
Below is a list of 7 different methods to remove rows with NA values in R.
na.omit()
Functionnewdf <- na.omit(df)
complete.cases()
Functionnewdf <- df[complete.cases(df), ]
rowSums()
Functionnewdf <- df[rowSums(is.na(df)) == 0, ]
drop_na()
Functionlibrary(tidyr) newdf <- df %>% drop_na()
subset()
& rowSums()
Functionsnewdf <- subset(df, rowSums(is.na(df)) != ncol(df))
filter()
& rowSums()
Functionslibrary(dplyr) newdf <- filter(df, rowSums(is.na(df)) != ncol(df))
rowSums()
& ncol()
Functionsnewdf <- df[rowSums(is.na(df)) != ncol(df), ]
Here we are creating a dataframe named df
for demonstration purpose. This dataframe has 6 observations and 4 columns. Column names are name, sex, score and address.
df <- data.frame(name = c('deeps','sandy', 'david', NA,'preet',NA), sex = c('Male', 'Male', NA, NA, 'Female',NA), score = c(50, 100, 45, 100, 90, NA), address = c('London', 'Bangalore', NA, NA, NA,NA))
Data are shown in the table below.
name | sex | score | address |
---|---|---|---|
deeps | Male | 50 | London |
sandy | Male | 100 | Bangalore |
david | NA | 45 | NA |
NA | NA | 100 | NA |
preet | Female | 90 | NA |
NA | NA | NA | NA |
na.omit()
FunctionHere we are using na.omit()
function to remove rows that contain any NA values. This function checks each row and removes any row that contains one or more NA values. It returns a subset of the original data frame without the rows that have missing values.
newdf <- na.omit(df)
name sex score address 1 deeps Male 50 London 2 sandy Male 100 Bangalore
We have created a new data frame called newdf
by removing rows that contain any NA (missing) values from the original data frame df
.
complete.cases()
FunctionIn this example, we will see how to use complete.cases()
function to remove rows that contain any NA values.
In R, the complete.cases()
function returns TRUE for rows in a data frame which are complete (no missing values). df[complete.cases(df), ]
selects all non-missing rows from the original data frame df, effectively removing rows with any missing values.
newdf <- df[complete.cases(df), ]
name sex score address 1 deeps Male 50 London 2 sandy Male 100 Bangalore
rowSums()
FunctionBy using the combination of the rowSums()
and is.na()
functions, we can remove rows that have at least one NA value.
newdf <- df[rowSums(is.na(df)) == 0, ]
name sex score address 1 deeps Male 50 London 2 sandy Male 100 Bangalore
Let's understand how code works:
is.na(df)
returnsTRUE
if the corresponding element indf
is NA, andFALSE
otherwise.rowSums(is.na(df))
calculates the sum ofTRUE
values in each row. This gives us a numeric vector with the number of missing values (NAs) in each row ofdf
.rowSums(is.na(df)) == 0
compares each element of the numeric vector with zero. This results in a logical vector whereTRUE
indicates that the corresponding row has no missing values (NAs).df[rowSums(is.na(df)) == 0, ]
selects only the rows without any missing values.
drop_na()
FunctionTo remove rows with any missing values (NAs) from a data frame using the tidyr package, you can use the drop_na() function.
If tidyr package is not already installed, you can install it using this command - install.packages("tidyr")
library(tidyr) newdf <- df %>% drop_na()
name sex score address 1 deeps Male 50 London 2 sandy Male 100 Bangalore
In the code above, the %>% operator is used to pipe the data frame df into the drop_na() function. This function removes any rows containing NAs from the data frame and assigns the result to the new data frame newdf.
subset()
& rowSums()
FunctionsIn this example, we will see how to remove rows from a data frame where all values in a row are missing (NA).
This example is different from the previous examples in the sense that it is about deleting rows where only missing values exist in a row, rather than at least one missing value in a row.
newdf <- subset(df, rowSums(is.na(df)) != ncol(df))
In the code above, the subset()
function is used to filter the data frame df based on a specific condition. The condition rowSums(is.na(df)) != ncol(df)
is used to check for each row of the data frame if the sum of missing values is not equal to the total number of columns. Rows that meet this condition, i.e., rows without missing values, are kept in the new data frame newdf, while rows with any missing values are removed.
filter()
& rowSums()
FunctionsIn this example, we will see how to remove rows from a data frame where all values in a row are missing (NA) using filter()
, rowSums()
& is.na()
functions.
Make sure dplyr package is installed. If not, you can install it using this command - install.packages("dplyr")
library(dplyr) newdf <- filter(df, rowSums(is.na(df)) != ncol(df))
name sex score address
1 deeps Male 50 London
2 sandy Male 100 Bangalore
3 david NA 45 NA
4 NA NA 100 NA
5 preet Female 90 NA
This method is very similar to the previous method, with the only difference being that we use the filter() function from the dplyr package instead of the subset() function from Base R. The logic in this method remains the same as the previous method.
rowSums()
& ncol()
FunctionsTo remove rows with only NAs (missing values) in a data frame using the rowSums(), is.na() and ncol() functions, you can use the following code:
newdf <- df[rowSums(is.na(df)) != ncol(df), ]
name sex score address
1 deeps Male 50 London
2 sandy Male 100 Bangalore
3 david NA 45 NA
4 NA NA 100 NA
5 preet Female 90 NA
Share Share Tweet