Exploring Pandas in Python: Filter and Pivot Operations with Sample Data

5 min readNov 15, 2023

Pandas is a powerful data manipulation library in Python, and the filter and pivot functions are useful tools for working with DataFrame objects. Let's go through each of them using sample data.

import pandas as pd

# Sample Data
data = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
    'Category': ['A', 'B', 'A', 'B'],
    'Value': [10, 20, 30, 40]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

This creates a simple DataFrame with columns ‘Date’, ‘Category’, and ‘Value’. Now, let’s explore filter and pivot.

Output:

Original DataFrame:
         Date Category  Value
0  2023-01-01        A     10
1  2023-01-01        B     20
2  2023-01-02        A     30
3  2023-01-02        B     40

Filter

The filter function is used to subset the DataFrame based on column labels. You can use it to select specific columns.

# Filtering columns
filtered_df = df.filter(items=['Date', 'Value'])
print("\nFiltered DataFrame:")
print(filtered_df)

This will result in a DataFrame containing only the ‘Date’ and ‘Value’ columns.

Filtered DataFrame:
         Date  Value
0  2023-01-01     10
1  2023-01-01     20
2  2023-01-02     30
3  2023-01-02     40

Example 1: Filtering Rows Based on a Condition

You can use filter to select rows that meet a specific condition. For example, let's filter rows where the 'Value' is greater than 20.

# Filtering rows based on a condition
filtered_rows = df[df['Value'] > 20]
print("\nFiltered Rows:")
print(filtered_rows)

Filtered Rows:
         Date Category  Value
2  2023-01-02        A     30
3  2023-01-02        B     40

Example 2: Using `like` Parameter

The like parameter allows you to select columns based on partial string matching. For example, let's select columns that contain the string 'Dat'.

# Using the like parameter
selected_columns = df.filter(like='Dat')
print("\nSelected Columns:")
print(selected_columns)

Selected Columns:
         Date
0  2023-01-01
1  2023-01-01
2  2023-01-02
3  2023-01-02

Example 3: Using `regex` Parameter

The regex parameter enables you to use regular expressions to match column names. Let's select columns that start with 'C' or 'V'.

# Using the regex parameter
selected_columns_regex = df.filter(regex='^C|^V')
print("\nSelected Columns with Regex:")
print(selected_columns_regex)

Selected Columns with Regex:
  Category  Value
0        A     10
1        B     20
2        A     30
3        B     40

Example 4: Filtering Columns Based on a List

You can use filter to select columns based on a list of column names.

# Filtering columns based on a list
selected_columns_list = df.filter(items=['Date', 'Value'])
print("\nSelected Columns with List:")
print(selected_columns_list)

Selected Columns with List:
         Date  Value
0  2023-01-01     10
1  2023-01-01     20
2  2023-01-02     30
3  2023-01-02     40

Example 5: Using `items` Parameter

The items parameter can be used to select columns based on a list of substrings

# Using the items parameter
selected_columns_items = df.filter(items=['Date', 'Val'])
print("\nSelected Columns with Items:")
print(selected_columns_items)

Selected Columns with Items:
         Date
0  2023-01-01
1  2023-01-01
2  2023-01-02
3  2023-01-02

These examples showcase different ways to use the filter function in Pandas for both column and row selection based on various conditions and criteria.

Pivot

The pivot function is used to reshape the DataFrame by pivoting the values in one column into new columns. Let's pivot the data based on the 'Category' column.

# Pivoting the DataFrame
pivot_df = df.pivot(index='Date', columns='Category', values='Value')
print("\nPivoted DataFrame:")
print(pivot_df)

This will create a new DataFrame where unique values in the ‘Category’ column become new columns, and the ‘Value’ column provides the corresponding values.

Pivoted DataFrame:
Category     A   B
Date              
2023-01-01  10  20
2023-01-02  30  40

Pivot Table

If you have duplicate entries for a combination of index and columns in the pivot operation, you can use the pivot_table function with an aggregation function to handle the duplicates. Let's add another entry for '2023-01-01' and 'Category' 'A' to demonstrate this.

# Adding a duplicate entry
df = df.append({'Date': '2023-01-01', 'Category': 'A', 'Value': 15}, ignore_index=True)

# Using pivot_table to handle duplicates
pivot_table_df = df.pivot_table(index='Date', columns='Category', values='Value', aggfunc='sum')
print("\nPivot Table DataFrame:")
print(pivot_table_df)

This will create a pivot table where the values are aggregated using the sum function for duplicate entries.

Pivot Table DataFrame:
Category     A   B
Date              
2023-01-01  25  20
2023-01-02  30  40

Example 1: Multi-level Indexing

You can create a multi-level index by passing a list of columns to the index parameter.

# Using pivot with multi-level indexing
multi_level_pivot = df.pivot(index=['Date', 'Category'], columns='Value')
print("\nMulti-level Index Pivot:")
print(multi_level_pivot)

This creates a DataFrame with a multi-level index based on ‘Date’ and ‘Category’.

Multi-level Index Pivot:
Empty DataFrame
Columns: []
Index: [(2023-01-01, A), (2023-01-01, B), (2023-01-02, A), (2023-01-02, B)]

Example 2: Handling Missing Values with `fillna`

You can use the fillna function to replace missing values after pivoting.

# Adding a duplicate entry
df = df.append({'Date': '2023-01-01', 'Category': 'A', 'Value': 15}, ignore_index=True)

# Using groupby and pivot_table to handle duplicates
pivot_fillna = df.groupby(['Date', 'Category'])['Value'].sum().unstack(fill_value=0)
print("\nPivot with Missing Values Filled:")
print(pivot_fillna)

Pivot Table with Sum for Duplicate Entries:
Category     A   B
Date              
2023-01-01  25  20
2023-01-02  30  40

Example 3: Aggregating with `aggfunc`

The aggfunc parameter allows you to specify an aggregation function for duplicate entries. Let's use the aggfunc parameter to calculate the average value for duplicate entries.

# Using pivot_table to calculate the average for duplicate entries
pivot_avg = df.pivot_table(index='Date', columns='Category', values='Value', aggfunc='mean')
print("\nPivot Table with Average for Duplicate Entries:")
print(pivot_avg)

Pivot Table with Average for Duplicate Entries:
Category       A     B
Date                  
2023-01-01  12.5  20.0
2023-01-02  30.0  40.0

Example 4: Resetting Index after Pivot

To reset the index after pivoting, you need to use pivot_table with an aggregation function for duplicate entries.

# Using pivot_table to handle duplicate entries and resetting index
pivot_table_df = df.pivot_table(index='Date', columns='Category', values='Value', aggfunc='sum')
pivot_reset_index = pivot_table_df.reset_index()
print("\nPivot Table with Reset Index:")
print(pivot_reset_index)

Pivot Table with Reset Index:
Category        Date   A   B
0         2023-01-01  10  20
1         2023-01-02  30  40

Example 5: Pivoting with Multiple Value Columns

You can pivot on multiple value columns by specifying a list for the values parameter.

# Sample Data with an additional 'Value2' column
data = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
    'Category': ['A', 'B', 'A', 'B'],
    'Value': [10, 20, 30, 40],
    'Value2': [100, 200, 300, 400]
}

df = pd.DataFrame(data)

# Pivoting with multiple value columns
multi_value_pivot = df.pivot(index='Date', columns='Category', values=['Value', 'Value2'])
print("\nPivot with Multiple Value Columns:")
print(multi_value_pivot)

Pivot with Multiple Value Columns:
           Value     Value2     
Category       A   B      A    B
Date                            
2023-01-01    10  20    100  200
2023-01-02    30  40    300  400

Exploring Pandas in Python: Filter and Pivot Operations with Sample Data

Filter

Example 1: Filtering Rows Based on a Condition

Example 2: Using `like` Parameter

Example 3: Using `regex` Parameter

Example 4: Filtering Columns Based on a List

Example 5: Using `items` Parameter

Pivot

Pivot Table

Example 1: Multi-level Indexing

Example 2: Handling Missing Values with `fillna`

Example 3: Aggregating with `aggfunc`

Example 4: Resetting Index after Pivot

Example 5: Pivoting with Multiple Value Columns

Written by Pawan Kumar Ganjhu

No responses yet

Exploring Pandas in Python: Filter and Pivot Operations with Sample Data

Filter

Example 1: Filtering Rows Based on a Condition

Example 2: Using like Parameter

Example 3: Using regex Parameter

Example 4: Filtering Columns Based on a List

Example 5: Using items Parameter

Pivot

Pivot Table

Example 1: Multi-level Indexing

Example 2: Handling Missing Values with fillna

Example 3: Aggregating with aggfunc

Example 4: Resetting Index after Pivot

Example 5: Pivoting with Multiple Value Columns

Written by Pawan Kumar Ganjhu

No responses yet

Example 2: Using `like` Parameter

Example 3: Using `regex` Parameter

Example 5: Using `items` Parameter

Example 2: Handling Missing Values with `fillna`

Example 3: Aggregating with `aggfunc`