Exploring Flask REST APIs for Data Engineering: Concepts, Code, and Advantages

Pawan Kumar Ganjhu
10 min readJun 6, 2023

--

Flask is a popular Python web framework that can be used to create RESTful APIs for data engineering tasks. Flask provides a simple and flexible way to build APIs that can handle HTTP requests and responses. Here are some use cases and examples of using Flask for data engineering tasks:

  1. Data Retrieval: Flask can be used to build APIs that retrieve data from databases, files, or external APIs. Here’s an example that retrieves data from a PostgreSQL database:
from flask import Flask, jsonify
import psycopg2

app = Flask(__name__)

@app.route('/data', methods=['GET'])
def get_data():
conn = psycopg2.connect(host='localhost', dbname='mydb', user='myuser', password='mypassword')
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable')
data = cursor.fetchall()
conn.close()
return jsonify(data)

if __name__ == '__main__':
app.run()

When you access http://localhost:5000/data in your browser or send a GET request to that URL, it will retrieve data from the PostgreSQL database and return it as a JSON response.

2. Data Transformation: Flask can be used to build APIs that perform data transformations or calculations. Here’s an example that calculates the sum of two numbers:

from flask import Flask, request

app = Flask(__name__)

@app.route('/sum', methods=['POST'])
def calculate_sum():
data = request.get_json()
num1 = data['num1']
num2 = data['num2']
result = num1 + num2
return str(result)

if __name__ == '__main__':
app.run()

You can send a POST request to http://localhost:5000/sum with a JSON payload like {"num1": 3, "num2": 5}, and it will return the sum of the numbers as a string.

3. Data Ingestion: Flask can be used to build APIs that accept data uploads or ingest data from external sources. Here’s an example that accepts a file upload and saves it to the server:

from flask import Flask, request

app = Flask(__name__)

@app.route('/upload', methods=['POST'])
def upload_file():
file = request.files['file']
file.save('uploads/' + file.filename)
return 'File uploaded successfully!'

if __name__ == '__main__':
app.run()

You can send a POST request to http://localhost:5000/upload with a file upload, and it will save the uploaded file to the "uploads" directory.

Sample outputs:

  1. For the “Data Retrieval” example, accessing http://localhost:5000/data would return a JSON response containing the retrieved data from the database. The response might look like:
[
{
"id": 1,
"name": "John",
"age": 25
},
{
"id": 2,
"name": "Jane",
"age": 30
}
]

2. For the “Data Transformation” example, sending a POST request to http://localhost:5000/sum with a JSON payload like {"num1": 3, "num2": 5} would return the sum of the numbers as a string. The response would be:

8

3. For the “Data Ingestion” example, sending a POST request to http://localhost:5000/upload with a file upload would return the message "File uploaded successfully!" upon successful file upload.

These are just a few examples of how Flask can be used in data engineering tasks. Flask provides a flexible and powerful framework for building RESTful APIs, allowing you to handle various data operations efficiently.

Methods in FLASK

Flask provides several methods that are commonly used in data engineering tasks when building RESTful APIs. Here is an explanation of some important Flask methods and how they can be used in data engineering:

  1. @app.route(): This decorator is used to define the URL routes for your API endpoints. It allows you to specify the URL path and the HTTP methods (GET, POST, PUT, DELETE, etc.) that your API endpoint should respond to. You can use different routes to handle different data engineering tasks, such as data retrieval, transformation, or ingestion.
from flask import Flask

app = Flask(__name__)

@app.route('/data', methods=['GET'])
def get_data():
# Data retrieval logic here
return 'Data retrieved successfully!'

@app.route('/transform', methods=['POST'])
def perform_transformation():
# Data transformation logic here
return 'Transformation complete!'

if __name__ == '__main__':
app.run()

In the example above, the /data route handles a GET request to retrieve data, and the /transform route handles a POST request to perform data transformation.

2. request: The request object allows you to access data from the incoming HTTP request. It provides methods to retrieve data from request parameters, headers, or request body (JSON, form data, or file uploads). This is useful when handling data ingestion or performing calculations based on user input.

from flask import Flask, request

app = Flask(__name__)

@app.route('/sum', methods=['POST'])
def calculate_sum():
data = request.get_json()
num1 = data['num1']
num2 = data['num2']
result = num1 + num2
return str(result)

if __name__ == '__main__':
app.run()

In the above example, the request object is used to retrieve JSON data from the request body (request.get_json()) and perform a calculation based on the received data.

3. jsonify(): The jsonify() function is used to convert Python objects into JSON responses. It is particularly useful when returning JSON data as a response from your API, such as when retrieving data from a database.

from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/data', methods=['GET'])
def get_data():
data = [
{
'id': 1,
'name': 'John',
'age': 25
},
{
'id': 2,
'name': 'Jane',
'age': 30
}
]
return jsonify(data)

if __name__ == '__main__':
app.run()

In the above example, the jsonify() function is used to convert the data list into a JSON response, which will be returned when accessing the /data route.

4. file.save(): This method is used to save uploaded files to the server's file system. It is useful for handling data ingestion scenarios where users can upload files.

from flask import Flask, request

app = Flask(__name__)

@app.route('/upload', methods=['POST'])
def upload_file():
file = request.files['file']
file.save('uploads/' + file.filename)
return 'File uploaded successfully!'

if __name__ == '__main__':
app.run()

In the above example, the file.save() method is used to save the uploaded file to the "uploads" directory on the server.

These are some of the important Flask methods that can be used in data engineering tasks when building RESTful APIs. They provide the necessary functionality to handle data retrieval, transformation, ingestion, and response formatting.

FLASK HTTP Methods

In data engineering, the HTTP methods GET, PUT, POST, DELETE, and others play a significant role when designing RESTful APIs for data operations. Here’s an explanation of these HTTP methods in the context of data engineering:

  1. GET: The GET method is used to retrieve data from a server. In data engineering, GET requests are commonly used to fetch data from databases, files, or external APIs. For example, you can use a GET request to retrieve records from a database table, fetch a file’s content, or query data from an external API.
  2. POST: The POST method is used to submit data to be processed by a server. In data engineering, POST requests are often used for data ingestion or data transformation operations. You can send POST requests to upload files, submit form data, or send JSON payloads for processing and storage. POST requests are typically used to create new data entries or trigger data processing tasks.
  3. PUT: The PUT method is used to update existing data on a server. In data engineering, PUT requests are useful for modifying existing records or updating data in databases or storage systems. You can send PUT requests with a payload containing the updated data, and the server will update the corresponding record accordingly.
  4. DELETE: The DELETE method is used to remove data from a server. In data engineering, DELETE requests are employed to delete records or data entries from databases, storage systems, or any other resources. By sending a DELETE request with the appropriate identifier or criteria, you can instruct the server to delete the corresponding data.
  5. PATCH: The PATCH method is used to partially update data on a server. It is similar to the PUT method but allows you to update only specific fields or properties of an existing resource. In data engineering, PATCH requests can be useful when you want to modify specific attributes of a record without sending the entire updated object.

These HTTP methods, along with others like OPTIONS, HEAD, and TRACE, provide different ways to interact with data in RESTful APIs. As a data engineer, understanding these methods enables you to design APIs that handle data retrieval, modification, ingestion, transformation, and deletion effectively.

  • Here’s an example code snippet showcasing the usage of different HTTP methods (GET, POST, PUT, DELETE) in a Flask RESTful API:
from flask import Flask, jsonify, request

app = Flask(__name__)

# Sample data
data = [
{
'id': 1,
'name': 'John',
'age': 25
},
{
'id': 2,
'name': 'Jane',
'age': 30
}
]

# GET method - Retrieve all data
@app.route('/data', methods=['GET'])
def get_data():
return jsonify(data)

# POST method - Create new data entry
@app.route('/data', methods=['POST'])
def create_data():
new_entry = request.get_json()
data.append(new_entry)
return jsonify(new_entry), 201

# GET method - Retrieve specific data entry
@app.route('/data/<int:id>', methods=['GET'])
def get_data_by_id(id):
for entry in data:
if entry['id'] == id:
return jsonify(entry)
return jsonify({'error': 'Data entry not found'}), 404

# PUT method - Update specific data entry
@app.route('/data/<int:id>', methods=['PUT'])
def update_data(id):
for entry in data:
if entry['id'] == id:
updated_entry = request.get_json()
entry.update(updated_entry)
return jsonify(entry)
return jsonify({'error': 'Data entry not found'}), 404

# DELETE method - Delete specific data entry
@app.route('/data/<int:id>', methods=['DELETE'])
def delete_data(id):
for entry in data:
if entry['id'] == id:
data.remove(entry)
return jsonify({'message': 'Data entry deleted'})
return jsonify({'error': 'Data entry not found'}), 404

if __name__ == '__main__':
app.run()

This example demonstrates a basic Flask API that manages a collection of data entries. The API supports the following operations:

  • GET /data: Retrieves all data entries.
  • POST /data: Creates a new data entry.
  • GET /data/<id>: Retrieves a specific data entry by ID.
  • PUT /data/<id>: Updates a specific data entry by ID.
  • DELETE /data/<id>: Deletes a specific data entry by ID.

Note that this is a simplified example, and in a real-world scenario, you would likely interact with databases or other data sources instead of manipulating data directly in memory.

Advantages of using API in data engineering projects

Using APIs (Application Programming Interfaces) in data engineering projects offers several advantages:

  1. Data Integration: APIs enable seamless integration and communication between different systems and applications. Data engineering projects often involve working with multiple data sources, databases, services, or external APIs. By leveraging APIs, you can establish connections and exchange data between these systems efficiently.
  2. Standardization: APIs provide a standardized way to access and manipulate data. They define clear interfaces, data formats, and protocols for communication. This standardization allows different components of a data engineering project to work together smoothly, ensuring consistency and reducing potential errors or compatibility issues.
  3. Scalability and Modularity: APIs promote scalability and modularity in data engineering projects. By encapsulating functionalities into APIs, you can build independent components that can be easily scaled, reused, and maintained. This modular approach facilitates project management, enables parallel development, and enhances the overall flexibility and agility of the system.
  4. Automation and Efficiency: APIs enable automation of data operations, reducing manual effort and improving efficiency. By exposing APIs for repetitive or complex data tasks, you can automate data ingestion, transformation, processing, and retrieval. This helps streamline workflows, eliminate manual errors, and improve overall productivity in data engineering projects.
  5. Collaboration and Reusability: APIs facilitate collaboration among teams and promote reusability of code and functionalities. Data engineering projects often involve multiple stakeholders, such as data scientists, analysts, and developers. APIs provide a clear contract and interface for data access, enabling different teams to work concurrently and independently. Additionally, APIs can be reused across projects, saving development time and effort.
  6. Security and Access Control: APIs allow you to implement security measures and access controls to protect sensitive data. By utilizing authentication, authorization, and encryption mechanisms in APIs, you can ensure that only authorized users or systems can access and manipulate the data. This helps maintain data privacy, compliance with regulations, and mitigates security risks.
  7. Ecosystem and Integration: APIs enable integration with third-party services, libraries, or frameworks. In data engineering, you may need to leverage external services for data enrichment, analytics, machine learning, or visualization. APIs facilitate the integration of these services into your data pipelines, providing access to advanced functionalities and expanding the capabilities of your projects.

Overall, APIs play a crucial role in data engineering projects, enabling seamless integration, automation, scalability, and collaboration. They empower data engineers to build robust and efficient systems while leveraging the broader ecosystem of tools and services available in the data engineering landscape.

Alternative and Better Approach to REST FLASK API

While Flask is a popular framework for building RESTful APIs, there are alternative approaches available that offer different features and advantages. One such alternative is GraphQL.

GraphQL is a query language for APIs that provides a more flexible and efficient way to request and manipulate data. Here are some key aspects that make GraphQL an alternative and potentially better approach for certain use cases:

  1. Flexible Data Fetching: With GraphQL, clients can specify exactly what data they need, avoiding over-fetching or under-fetching of data. Clients can request multiple resources and fields in a single request, reducing the number of round trips to the server.
  2. Strongly Typed: GraphQL has a strong type system that allows clients and servers to have a clear understanding of the data being exchanged. This enables better validation, documentation, and tooling support.
  3. Efficient Data Transfer: GraphQL uses a single endpoint and allows clients to retrieve only the data they require, minimizing the amount of data transferred over the network. This can lead to improved performance and reduced bandwidth usage.
  4. Versioning and Evolution: GraphQL provides built-in versioning support, allowing the API to evolve without breaking existing clients. Clients can explicitly request specific versions of the schema, ensuring compatibility and smooth transitions.
  5. Developer Experience: GraphQL offers a more intuitive and self-documenting API compared to traditional REST APIs. It provides a schema that describes the available data and operations, enabling better client-server collaboration and reducing the need for additional documentation.
  6. Ecosystem and Tooling: GraphQL has a growing ecosystem with various tools, libraries, and frameworks that support its implementation. These tools provide features like schema generation, data validation, caching, and client SDKs, enhancing developer productivity.

However, it’s important to note that the choice between REST and GraphQL depends on the specific requirements and constraints of your project. While GraphQL offers advantages in certain scenarios, RESTful APIs remain a reliable and widely adopted approach for building web services. Consider factors such as data complexity, performance needs, client requirements, and the existing ecosystem when evaluating the best approach for your data engineering project.

In conclusion, utilizing APIs in data engineering projects provides numerous advantages. APIs enable seamless integration and communication between systems and data sources, promoting standardization and reducing compatibility issues. They facilitate scalability and modularity, allowing for flexible and reusable components. Automation and efficiency are achieved through API-driven workflows, eliminating manual effort and reducing errors. APIs foster collaboration among teams, enabling concurrent development and code reusability. Security measures and access controls can be implemented to protect sensitive data. APIs also enable integration with third-party services, expanding the capabilities of data engineering projects. Overall, APIs are fundamental tools that enhance productivity, streamline workflows, and empower data engineers to build robust and efficient data systems.

--

--

Pawan Kumar Ganjhu

Data Engineer | Data & AI | R&D | Data Science | Data Analytics | Cloud