Creating a new apf step¶

apf was created to simplify the development of an stream processing pipeline.

To illustrate how the creation of a pipeline step was intended we have the following diagram.

This tutorial will guide developers to create an example step from the installation of the framework until building and running the docker image locally.

1. Installing apf¶

To install apf run

pip install apf_base

This will install the package and a command line script.

apf [--help] command

2. Creating base step¶

apf comes with a code generation tool to create a base for a new step.

To create this base run

apf new-step example_step

This command will create the following file tree

example_test/
├── example_test/
│   ├── __init__.py
│   └── step.py
├── scripts/
│   ├── run_multiprocess.py
│   └── run_step.py
├── tests/
├── Dockerfile
├── requirements.txt
└── settings.py

The step will be a python package called example_test, inside the package there is a step.py with the step logic.

3. Coding the step¶

In example_test/step.py we will code the step logic, it can be as simple as printing the message or a more complex logic. For each new message the execute() method is called with a python dict with the message itself.

#example_test/step.py
def execute(self,message):
  ################################
  #   Here comes the Step Logic  #
  ################################

  pass

For this example we will just log the message changing the execution code to

#example_test/step.py
def execute(self,message):
  # Logging the message
  self.logger.info(message)

Here self.logger is the default logger (logging.Logger) from apf.core.GenericStep.

Then we can go to scripts/run_step.py. This scripts runs the step, here we can override the consumers, producers and other plugins used in the step.

The basic run_step.py comes with the following

#scripts/run_step.py
step = ExampleTest(config=STEP_CONFIG,level=level)
step.start()

But you can pass callables to override the consumer, producer and metrics_sender that are otherwise defined by settings file.

An alternative step initialization could look like this:

#scripts/run_step.py
step = ExampleTest(
            consumer=KafkaConsumer,
            producer=KafkaProducer,
            metrics_sender=KafkaMetricsProducer,
            config=STEP_CONFIG,
            level=level
)
step.start()

This can be useful for tests as well, since you can pass a mock class and do not need to rely on settings, that have more boilerplate.

4. Configuring the step¶

After coding the step and modifying the script, the step must be configured.

There are 2 files needed to configure a step.

1- settings.py:

This file contains all the configuration passed to the consumers, producers and plugins. Having it separately from the main script make it easier to change configurations from run to run.

For good practice having environmental variables as parameters is better than hard-coding them to the settings file, and comes very handy when deploying the same dockerized step with different configurations.

The basic settings.py comes with the following
#settings.py
CONSUMER_CONFIG = {}  #Consumer configuration
STEP_CONFIG = {
  "N_PROCESS" # Number of prcesses on multi-process script.
}                     #Step Configuration
We will test our step with a CSVConsumer
#settings.py
CONSUMER_CONFIG = {
  "CLASS": "apf.consumers.CSVConsumer",
  "FILE_PATH": "https://raw.githubusercontent.com/alercebroker/APF/develop/docs/source/_static/example/detections.csv",
  "OTHER_ARGS": {
      "index_col": "oid"
  }
}

2- requirements.txt

The default requirements file for any python package, for good practice having the package with and specific version is better than using the latest one.

In this example we are using only the GenericConsumer(), there is no need to specify parameters for this consumer.

The basic requirements.txt comes with the current apf version as a required package
#requirements.txt
apf==<version>
By default the apf package is already on the requirements file, so for this tutorial we will skip this step.

5. Running the step locally¶

The step can me executed as a single process with

python scripts/run_step.py

To run the step dockerized, first we need to build the step

docker build -t example_step .
docker run --rm --name example_step example_step

Note

Try using another Consumer configure it and run it locally to check it works. For example a CSVConsumer or a JSONConsumer