Pandas drop rows vs filter with examples
In the Python pandas library, the drop
function is used to remove rows or columns from a pandas DataFrame, while the filter
function is used to subset rows based on a given condition.
Pandas is a powerful library in Python for data manipulation and analysis. It provides functions and methods to perform a wide range of operations on DataFrames, including dropping rows or columns and filtering rows based on a condition.
In this blog, we will explore the difference between the drop
and filter
functions in pandas, and provide examples of how to use them.
Dropping Rows or Columns with the drop Function
The drop
function in pandas can be used to remove rows or columns from a DataFrame. It takes a list of labels (either row labels or column labels) as the first argument, and the axis
parameter specifies whether to drop rows (axis=0)
or columns (axis=1)
.
Here is an example of using the drop
function to remove rows from a DataFrame:
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12]})
# Print the DataFrame
print(df)
# A B C
# 0 1 5 9
# 1 2 6 10
# 2 3 7 11
# 3 4 8 12
# Drop rows with index 1 and 3
df = df.drop([1, 3])
# Print the updated DataFrame
print(df)
# A B C
# 0 1 5 9
# 2 3 7 11
We can also use the drop
function to remove columns from a DataFrame. In this case, we need to set the axis
parameter to 1
.
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12]})
# Print the DataFrame
print(df)
# A B C
# 0 1 5 9
# 1 2 6 10
# 2 3 7 11
# 3 4 8 12
# Drop column B
df = df.drop(['B'], axis=1)
# Print the updated DataFrame
print(df)
# A C
# 0 1 9
# 1 2 10
# 2 3 11
# 3 4 12
Filtering Rows with the filter Function
The filter
function in pandas can be used to subset rows based on a given condition. It takes a boolean mask as the first argument, which specifies which rows to keep (True)
and which rows to drop (False)
.
Here is an example of using the filter
function to subset rows based on a condition:
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12]})
# Print the DataFrame
print(df)
# A B C
# 0 1 5 9
# 1 2 6 10
# 2 3 7 11
# 3 4 8 12
# Filter rows where column A is greater than 2
df = df.filter(df['A'] > 2)
# Print the updated DataFrame
print(df)
# A B C
# 2 3 7 11
# 3 4 8 12