Polars In Python

Pawan Kumar Ganjhu
2 min readNov 17, 2023

--

Polars vs Pandas

In Python, Polars is a fast DataFrame library that is similar to Pandas but designed to provide better performance on large datasets. It is built on top of Apache Arrow and Rust, making it efficient for analytical workloads. Polars is particularly useful for handling large datasets and performing operations like filtering, aggregations, and transformations efficiently.

Here’s a brief overview of using Polars in Python:

Installation:

You can install Polars using pip:

pip install polars

Basic Usage:

Importing Polars:

import polars as pl

Creating a DataFrame

# Creating a DataFrame from a Python dictionary
data = {'column1': [1, 2, 3], 'column2': ['a', 'b', 'c']}
df = pl.DataFrame(data)
df
shape: (3, 2)
column1 column2
i64 str
1 "a"
2 "b"
3 "c"

Reading Data:

# Reading from a CSV file
df = pl.read_csv('your_file.csv')
df.head()
# Reading from a Parquet file
# df = pl.read_parquet('your_file.parquet')

Selecting Columns:

# Selecting a single column
col = df['column_name']

# Selecting multiple columns
cols = df[['column1', 'column2']]

Filtering:

# Filtering rows based on a condition
filtered_df = df.filter(df['column1'] > 2)

Sorting:

# Sorting the DataFrame by a column
sorted_df = df.sort('column1')

Aggregations:

# Performing aggregations
agg_df = df.groupby('column1').agg(pl.col('column2').count().alias('count'))

Adding a New Column:

# Adding a new column based on existing columns
df = df.with_column(pl.col('new_column', df['column1'] + df['column2']))

Exporting Data:

# Exporting DataFrame to a CSV file
df.write_csv('output.csv')

# Exporting DataFrame to a Parquet file
df.write_parquet('output.parquet')

These are just some basic examples to get you started with Polars. The library provides a wide range of functionalities for data manipulation and analysis, so you might want to explore the official documentation for more advanced features: Polars Documentation.

Polars vs. Pandas:

Polars and Pandas are both libraries in Python for data manipulation and analysis, particularly with tabular data. While they share some similarities, they have distinct differences in terms of design, performance, and functionality. Here's a comparison between Polars and Pandas:

--

--

Pawan Kumar Ganjhu

Data Engineer | Data & AI | R&D | Data Science | Data Analytics | Cloud