Nifi Processor Application Stack - V 4.1

Introduction

The Nifi Processor application stack is used to build custom Nifi processors in Java which are packaged in a Nifi archive format known as "NAR" files (similar structure to a Java archive file or "JAR" file) and deploy these processors into Nifi clusters.

The deployment flow for this application stack shares common requirements with other application stacks deployed to the CloudOne environment. These shared requirements include:

Code quality scans (Coverity)
Staging of the application logic in the appropriate test environment or environments
Pause of deployments until completion of Unit Testing and User Acceptance Testing
Gated approvals before allowing deployments into pre-production and production environments
Audit trail and history of deployments within the CloudOne CI/CD Pipeline

Because of the above-listed requirements, the Nifi Processor application stack is provided in order to support the build and deployment of Nifi NAR files in a manner that integrates with CloudOne requirements and processes. The flow of deployment includes first a Continuous Integration stage of processing in a pipeline prior to deployment in the Continuous Deployment stages. The Continuous Integration stage focuses on building the processor and staging the resulting NAR file in the appropriate Artifactory repository, ready for deployment. Subsequent pipeline stages deploy the processor to the appropriate target Nifi clusters.

The Nifi Processor Build Product

In contrast to many other appstacks in CloudOne that produce a runnable entity to be deployed as a container inside of a Kubernetes platform, the Nifi Processor stack produces an entity that cannot be run on its own. A Nifi cluster is an ETL (Extract-Transform-Load) platform for receiving data, processing that data in some manner and then delivering the data. The Nifi platform contains many "processors" to carry out any one of these three ETL functions.

When a functional need cannot be satisfied by one of the existing processors, custom logic can be developed and added to the Nifi platform. This logic is written in Java and then packaged in the form of a Nifi Processor, represented by this application stack. A Nifi Processor is deployed into a Nifi cluster, but even after this deployment, the processor is not yet a running entity. That Nifi Processor must then be added into an ETL flow and its relationships within that flow must be configured as well. Once this is done, the Nifi Processor will be executed by the ETL flow when that flow is run.

Getting Started in the Azure DevOps environment

Refer to the following link to learn about getting started in the Azure DevOps environment: Getting Started in Azure DevOps Environment

Source Code Repository Structure

The structure of the source code repository for the Nifi processor stack will contain a source directory tree with the application source code, structured as follows:

source - the structure of the source directory should be generated by Maven based on a Nifi processor "archetype". Within this structure will be 3 pom.xml files, one in this source directory and one in each of two sub-directories of the source directory. One of those sub-directories will contain the actual Java source code for the processor and the other will contain code to "wrap" the Java processor (as a JAR file) into a Nifi-proprietary NAR file. The pom.xml file in the source directory builds the Java (JAR file) first and then runs the wrapper build to construct the final NAR file.

See below for details on how to generate this source directory structure.

azure-pipelines.yml - This file, also at the top of the repository, contains reference to the appropriate version of the CI/CD pipeline logic, some variables unique to the processor (e.g. Nifi processor version) as well as YAML data structures providing key information about the environments into which to deploy the processor and the sequence of events to complete the deployment (e.g. dependencies, etc).

Additional items in this repository will generally not be modified and should not be changed to avoid risk of breaking the pipeline workflows.

How to Generate Source Code Structure for Nifi Processors

In order to create the skeleton of a project, clone a local copy of the mostly empty repository generated by the CloudOne onboarding process for the new Nifi processor stack and then follow these steps at the top of the cloned Git repository:

If not yet there, create the source sub-directory
Change to the source directory
Issue the Maven command: mvn archetype:generate
Respond to the prompt for groupId by providing a Java package path (e.g. com.netapp.nifiprocessor)
Respond to the prompt for artifactId with the name for the project (should match the Git repository name, e.g. nifiprocessor)
Respond to the prompt for version with an appropriate semver format version number (e.g. 1.0.0)
Respond to the prompt for artifactBaseName with the same project name used for the artifactId
For package, accept the default which is generated based on the previous inputs
Confirm all the inputs to proceed
When complete, the repository is ready to be committed back into Git

Once the mvn archetype:generate run is completed, a new sub-directory will be created under source named for the project name specified above for the artifactId (identical to the artifactBaseName value as well).

Under the new source/project-name directory will be two sub-directories:

nifi-project-name-processor - contains the Java source code for the new Nifi processor, where the new logic should be applied
nifi-project-name-nar - contains the logic provided by Maven to package the Java code into a Nifi archive file, or NAR file, for delivery to a Nifi cluster (or clusters)

The process of building the NAR file will be to build from the pom.xml file located in the newly generated source/project-name directory, which will call, in turn, the pom.xml files of its two sub-directories. When a successful build is complete, the resulting NAR file will be located under the nifi-project-name-nar/target directory (and a corresponding JAR file will be located under nifi-project-name-processor/target).

Application Configuration into the CloudOne CI/CD Pipeline

In addition to the source code for the Nifi Processor, the source code repository also contains cconfiguration information to influence the behavior and flow of the CI/CD pipeline. The azure-pipelines.yml contains YAML sections to confure the pipeline, as described in this section.

The beginning of the azure-pipelines.yml file contains references to a shared code library containing the main pipeline logic, including a reference to the pipeline version (which must be version 3.7 or higher to support the Nifi Process stack). After declaring the shared code template to be included, a large parameters: YAML section contains the main items to be configured for a given project.

The parameters YAML objects include:

YAML Object	Description
appVersion	The semver-formatted version number of the processor
jdkVersion	The version of Java to use to build the code (either jdk8 or jdk11)
service	A YAML object to describe the application to be built - currently contains one component, name, to be set to the name of the Nifi Processor (i.e. project name)
messagespaces	A YAML object describing the Nifi clusters to which the Nifi Processor will be deployed

The messagespaces YAML object, which describes the Nifi clusters to which the Nifi Processor will be deployed, is defined as follows for each target environment (landscape in CloudOne terminology):

YAML Object	Description
Object Name	The first level beneath the messagespaces object is a series of deployment sections named for stages of deployment These sections are: stage and prod representing, respectively, the deployments that precede user acceptance testing and the deployments to production that follow approval steps. Each of these sections contains a YAML list of clusters as deployment targets
spacename	A name that will be displayed in the CI/CD pipeline to represent the target deployment environment
ref	An internal name for the same environment used to define more complicated dependency relationships in the CI/CD pipeline flow
hosts	A list of IaaS virtual machines/servers that comprise the Nifi cluster to which the Nifi Processor will be deployed. It is important that ALL* cluster members for a Nifi cluster be listed here as the Nifi Processor must be deployed to every server within a given cluster.*

The following is an example of this parameters section including the messagespaces object, illustrating the descriptions above:

extends:
  template: pipeline/nifi/init.yml@templates
  parameters:
    appVersion: 1.0.0
    jdkVersion: 'jdk8'
    service:
      name: nifiprocessor
    messagespaces:
      stage:
        - spacename: stg
          ref: stg1
          hosts:
            - vmwdxpapp07-stg.corp.netapp.com
            - vmwdxpapp08-stg.corp.netapp.com
            - vmwdxpapp09-stg.corp.netapp.com
      prod:
        - spacename: prd
          ref: prd1
          hosts:
            - vmwdxpapp01-tst.corp.netapp.com
            - vmwdxpapp02-tst.corp.netapp.com
            - vmwdxpapp03-tst.corp.netapp.com

Once the Nifi Processor is ready to be deployed into the CloudOne Nifi clusters and the new application version has been set as well, create a Git pull request for the updates to the code. The creation of this pull request will trigger the Continuous Integration pipeline to start.

Continuous Integration and Continuous Delivery Pipelines

Please note that the document “CloudOne Concepts and Procedures” contains more details about the specific flow of the application through the CI/CD pipelines

Once a pull request has been created against the Git repository, it will trigger the start of the CI/CD pipeline in order to build and prepare the Nifi Processor to deploy into the first Nifi cluster. Although the CI/CD pipeline flow is automatically triggered, there are times when it needs to be manually run (or re-run) and the pipeline should be examined for the results of its preparation of the processor and subsequent deployments. Details for examining the CI/CD pipeline and analyzing its results can be found here: Continuous Integration and Continuous Delivery Pipelines

Troubleshooting

If the Nifi Processor fails to deploy, the information about why the deployment failed (or was not even initiated) will be found in the logs of the CI/CD pipeline and can be tracked down using the methods described above in the “Continuous Integration and Continuous Delivery Pipelines” section.

However, additional information may be required, either to better troubleshoot a failed deployment or to investigate the runtime behavior of the Nifi Processor that has been successfully deployed. In those cases, much of the information can be found in the log files of the Nifi cluster itself, in particular, the nifi-app.log file, found by following these steps:

Log into one or more of the NIfi cluster member servers using either the appviewer or apprunner account (the appviewer account is preferred)
Navigate to the /devexp/var/nifi/servicename directory under which the log file sub-directory will be found, where servicename is the name of the Nifi stack provided at the time of provisioning the stack (or use the cdvar command alias as a short-cut)
Examine, using standard Linux shell commands, the log/nifi-app.log file to troubleshoot

For more details about accessing the Nifi cluster servers, please refer to the Access to Nifi Servers and Storage section of the Nifi stack documentation at: Nifi Application Stack.

CONTENTS