Pervasive DataCloud Application Programmer’s Interface (API) User’s Guide

Contents

Audience

The purpose for this guide is to provide programmers, developers, and technical managers with the knowledge of the DataCloud API.

What is the Pervasive DataCloud?

The Pervasive DataCloud® is an elastic cloud-based platform that allows for the design, deployment and management of web-based integrations/products. The platform utilizes Pervasive data management products for service design and the Pervasive process engine for execution.

The DataCloud may include, but is not limited to, the following products:

  • Pervasive Data Integrator™ - application that provides a vehicle for data mapping transformations.
  • Pervasive DataRush Profiler™- application that helps you profile your data based on quality metrics that that you can manipulate.
  • Pervasive DataRush™ - application that provides high-performance, parallel data-intensive processing and analytics, and also runs scripts or executables.

Goverance graphic

The deployment and management of the data services are handled via the platform’s web services-based Application Programmer’s Interface (API) and the DataCloud Management Console (DMC). The platform is focused on data integration and analytic services, however it can be leveraged for any data centric service.

Why use a DataCloud?

  • Speed Vision – Build a Service in Hours Not Days / Weeks
  • Automated Scale – Expand Service Reach Instantly (1 to 1 million)
  • Less IT Burden – No Server / Software / User Updates
  • Flexible Architecture – Change / Mix / Add Services / Infrastructure
  • Accessibility – Access / Manage / Report Anywhere
  • Measurable – Cost / Use Metrics on all Resources

Back to top

What is Pervasive DataCloud Entity-Based Architecture

The Pervasive DataCloud design is based on the concept of entities. The Pervasive DataCloud entities represent the “who,” “where,” “what,” “how,” and the “when” of the data service execution. There are five basic entities: User, Destination, Product, Provisioning, and Execution.

The following table provides both the GUI and entity names for the entity:

GUI Name Entity Name Contribution to process Comments

User

User

The Who

The ultimate owner of a data service, normally the developer.

Destination

Destination

The Where

The hardware profile of where the service executes, can vary from request to request.

Integration

Product

The What

The actual service definition.

Configuration

Provisioning and Parameters

The How

The variable settings for a service request; multiple requests will contain different values and consequently will execute differently.

Execution

Execution

The When

An execution contains the timing as well as references to all the aforementioned entities.

 

Each entity has a set of predefined fields or parameters. The fields are defined by the DataCloud and the values of all fields are available at the API level for read only access, so fields may or may not be available for write access.


Each entity may have one or more parameters attached. A parameter is a name/value pair defined at the API level. Parameters are available for read/write access.

Creating Unique Service Requests

The configuration tasks can be performed from a GUI interface called the DMC or programmatically via the API.


Parameters are associated with the deployed integration/product, in our use case above, data source credentials or query parameters are examples of variable parameters our service may require. This means that DataCloud services require some level of configuration which is referred to as provisioning.
 

A single integration/product may be associated with many different sets of configurations, which allows each instance of the service to run as a new and unique service based on the contents of those parameters. One of the parameters is normally linked to the specific service requestor. Although not required, this method allows the owner of the service to track who has requested a service and how often, as well as retrieve results on their behalf. 
 

Once an integration/product has been designed and deployed, then the provisioning can be configured and executed. Execution can be accomplished from the DMC or the API. The DMC is typically used when in the initial design and testing phases of an integration/product project, and the API is used once the service is placed into production for general consumption.


Results and log files related to the integration/product are also retrieved via the DMC or API. Subsequent services can be executed that can leverage these results.
 

Back to top

Understanding the use of the entities


Understanding the use of the entities is essential to leveraging the API. The common sequence is the developer first uses the API to create a destination and an empty product. The developer next designs and creates the integration/product. The developer then creates a provisioning, by adding the custom parameters that will hold the execution specific information. Finally, the developer creates an execution and requests that the platform execute the service.

The following diagram below provides an overview of how the entities work in the process.

Entity Diagram

 
Entities Explained

The following sections explain entities.

User


A user entity (the “who” in regard to integration/products) is a registered user of the Pervasive DataCloud. Users are known to Pervasive Software and must own a product based on the Pervasive DataCloud.


Note: Not all Users have full access to the Pervasive DataCloud API. To have full access a User must own the Pervasive DataCloud Development Package that includes the DMC, access to the Process Designer and access to the API.


Users leveraging the API on behalf of their customers may want to link their customers to specific service at execution time (using external IDs), which should be handled in provisioning entity parameters. The provisioning entity is discussed later.

 

Field

Definition

ID

The unique id of the customer

Name

The name of the custom

 

Destination

A destination can be considered the “where” in a data process. More specifically, a destination entity represents a description of runtime profile information. The Pervasive DataCloud is powered by AWS so developers have the ability to define the hardware on which execution occurs. The destination specifies queue name, worker image (AMI), work queue (SQS), and how auto-scaling should respond to messages in the queue.


Note: Destination and product entities must exist prior to creation of a process.

  

Field

Type

R/W

Definition

SQSQueueName

String

R/W

Queue name sends the process message. This is where the process engine pulls request info and run the process. This field can use a friendly name value.

AvailabilityZone

String

R/W

Users can choose their instance to run in availability zones from the following list. It would be advisable to choose the zone within the U.S. If you choose European zones, charges will be different. us-east- us- us-east- us- us-east- us- us-east- us- us-east- us- us-east- us- us-east- us- eu- eu- eu- eu-

AMIID

String

R/W

Stands for Amazon Machine Image (AMI). One customized AMI is assigned to each Destination which is used to bring up the new Amazon Instance based on other attributes in the table. This is bundled with customized utilities and integration engine. If the user requires special customization with some of these items in the instance, Pervasive can add them and re-bundle for that customization. 

InstanceType

String

R/W

Currently all the instances by default configured with m1.small. Depends on the users load and capacity requirements they can choose from different types. See the Instances Type table.

KeyName

String

R

A keypair is required to launch the instance. Currently all keypairs are generated by Pervasive and provided to users.

QueueUpperThreshold

int

R/W

This number plays a role along with MaxInstance attribute. Once the queue size exceeds this number and the current instance size is lesser than MaxInstance, Pervasive DataCloud will bring up a new instance for the specific Destination (auto scaling) until it reaches the MaxInstance count. This number always greater than QueueLowerThreshold.

QueueLowerThreshold

int

R/W

This number rolls along with MinInstance attribute. Once the queue size falls below this number from QueueUpperThreshold, the system will reduce the number of instances running for this Destination by one instance until it reaches the MinInstance count. This number is always lower than QueueUpperThreshold.

MinInstance

int

R/W

The number of minimum available instances which are ready to execute process requests. This number is always less than MaxInstance.

MaxInstance

int

R/W

Number of maximum available instances to execute a process request. This number always greater than MinInstance.

SecurityGroup

String

R

Security group is where Pervasive DataCloud controls the opening and closing of certain ports and allows access to IPs, firewall rules, etc. Currently controlled by Pervasive DataCloud administrators.

 

Product


While destination defines the “where,” the product defines the “what” at execution time. In other words, the product entity represents the integration/product definition that may be executed. A product is executed at a destination and holds information about the service. When a product is created a SKU must be provided. The SKU is then used to reference a specific product from anywhere within the API. The process location is defined in the product along with the name and associated destination.

 

Field

Type

R/W

Definition

Active

Boolean

R/W

Setting Active to false disables the product. Products cannot be deleted once they've been executed at least once, but they can always be deactivated.

Concurrent

Boolean

R/W

Setting Concurrent to false means that a single provisioning can be executed on Pervasive DataCloud at a time. Useful for most products that write data into an endpoint.

If Concurrent is true, Pervasive DataCloud will submit each new execution for this product to the destination without checking if the customer has a running execution with the same provisioning.

UserProduct

Boolean

R/W

Any product created through the API is a CustomerProduct, and CustomerProduct will be true.

Description

String

R/W

A brief description of the product.

Destination

String

R/W

The Id of the Destination to which the product sends executions

Id

String

R/W

A unique Id of the product. Must be user defined. Best practice is to insert a product SKU here.

Name

String

R/W

The name of the product.

Process

String

R/W

The entry-point process to run when an Execution is requested. A process file can be uploaded using the API to a directory under the product called integrationSpec. After a process file has been uploaded, the name of the process is entered in this field.

Provisioning

A Provisioning represents a Product and Customer association as well as a collection of parameters specific to the Customer and Product at execution time. Think of Provisioning as the "How" of execution, but it is also much more than that. Although every entity may have its own set of parameters, the set of parameters associated with a Provisioning is what allows each execution to be unique at runtime.

Information specific to the Provisioning, like a Pervasive Developer Customer's own customer's identification, can be stored in Provisioning. The parameters provided in Provisioning are available to the process. Credentials to access data from third-party sites, location URLs and unique identifiers are just some examples of Provisioning parameters.

Field

Type

R/W

Definition

ExpirationDate

Date

R/W

The date after which no provisioning may be run for this product.

Id

Long

R/W

The unique Id of the provisioning.

Product

Object

R

References the ID of the product with which this provisioning is associated.

ProductOwnedId

Object

R

Pervasive DataCloud internal use only.

Schedule

String

R/W

The delay in minutes between executions, or null if the provisioning is scheduled never to run.

StartDate

Date

R/W

The date the provisioning became active.

TimerHandle

 n/a

n/a

Pervasive DataCloud internal use only.

Execution

An Execution is the entity that encapsulates the actual execution or request of a service. The Execution represents the "When" of a service request. An Execution must be created any time a service is requested, and it must be a new and distinct Execution for each request. A Provisioning must be provided to create an Execution. Executions may be retrieved after execution to read runtime metrics.

Back to top

Appendix

Simple Use Case

A typical use case is an organization that requires a data service to integrate data from two distinct data sources.
Here are the objectives:

  1. Read records from a web-based data source.
  2. Transform the records and filter the records.
  3. Write the results to an output file for later consumption.

To build this process, complete the following steps:

  1. Configure Integrator with a transformation.

     
  2. Write code to execute the integration.
  3. Schedule the integration.

A data integration service is designed using the Pervasive Process Designer (PPD). While the main service is driven from PPD, other tools/processes may be leveraged. The design work may be performed in a local environment or hosted environment.


The Pervasive DataCloud defines the flow of data and includes any value-add business logic that the service would provide.  Individual services may be combined with other DataCloud services or third party services from other platforms.  In this case the essential design pattern is that the service receives a request (and required credentials), pulls data from one source, performs some intermediate work, and inserts or updates data in the other data source.


Once the design is completed and tested the design is deployed to the DataCloud using the Pervasive API or DMC. The service is then accessed from any platform able to leverage web services.  The service is auto scaling and can handle as many requests that are submitted because the service is contained within the DataCloud.  The service request can be configured to execute in its own unique process or virtual environment based on security requirements.

Back to top

Use Case with FTP to ASCII-Delimited as Source to Sugar XML as Target


Summary


Take the input from an ASCII-delimited file retrieved from an FTP Source, then map values contained within the file to an XML Schema structure that is modeled against Sugar CRM XML.
Components Used

  • FTP 3.x Invoker (PUT and GET)
  • XML-DMS Connector (Target)

Complete the folowing:

  1. Obtain the latest PCC Build.
  2.  Deploy the new connector to your instance.
  3. Set up your sample_data directories.

a.    In the case of this project, the root directories needed are:

i.    Root Macro (Global): http://<DESIGN_INSTANCE_URL>/diutil/webdav (see Adding files to your Staging Directory on the DataCloud.)

ii.    Source File: $(SAMPLE_DIR)DataPreparationRepository/Accounts.txt

iii.    Target File: $(SAMPLE_DIR)DataCloud/data/target/sugar_xmlresponse_target/xmltarget.xml

  1. Create a source dataset against the source file listed above
    1. Save the schema layout for the target.
  2. Create a target dataset against the target file listed above
    1.  Apply the schema saved from Step 4.
  3. Create the map.

a.    Use the source-target datasets created in steps 4-5.

b.    Map the fields as needed:

i.    Please reference the map in the project export for guidance.

Note: Cardinality matters even more in v10.

a.    Make sure that for cardinality of 1,1 to leverage SourceStarted->OutputRecord

b.    For cardinalities of 1,unlimited  you can try using RecordStarted->OutputRecord

c.    Create a configuration if desired to test out the map.

  1.   Create the process:

a.    Start the step

i.    Instantiate two DJMessage objects:

1.    set msgIn = new DJMessage "msgIn"

2.    set msgOut = new DJMessage "msgOut"

b.    Create scripting step for data prep

i.    RIFL

1.    msgIn.Properties("TargetName") = "Accounts"

2.    FileWrite("DJMessage:///msgIn", FileRead(MacroExpand("$(SAMPLE_DIR)") & "/DataPreparationRepository/Accounts.txt",-3))

c.    Create an FTP 3.x Invoker for PUT - this does two checks/actions: placing a file on FTP server, and mimic accessing a file from FTP

i.    Host: aaa.bbb.ccc.dd

ii.    ServerName: eee.fff.ggg.hh/uploads

iii.    FTP Type: SFTP

iv.    File Extension: txt

v.    Filename Format: %t

vi.    Message: msgIn

vii.    Action: PutMessage

d.    Create an FTP 3.x Invoker for GET - this mimics the idea of a file already being present on FTP to obtain

i.    Message: msgOut

ii.    Pattern: Accounts

iii.    File Extension: txt

iv.    BrowseMode: false (because we already place the file at the beginning of the process .No need to waste the space leaving that file on all the time)

v.    Action: GetMessage

8.  Create a staging script which will temporarily store the contents of msgOut to a file for consumption in a map

a.    RIFL

i.    FileWrite(MacroExpand("$(SAMPLE_DIR)") & "/Temp_Staging_Directory/UC1Temp.txt", msgOut.body, -3)

9.    Add a Transformation Step by referencing the map you created earlier.

       Note: Make sure the source is the staged file done in Step 8.

i.    You may need to update the dataset for this to be affective.

ii.    Should not need to refresh schema, since the layout is the same - just update connection string.

10.    Edit Stop Step

a.    RIFL

set msgIn = Nothing
set msgOut = Nothing

11. To validate, the file should appear as below in Expected Results.

Expected Results


Please reference UseCase1_ExpectedResults.xml

 Back to top

Expanded Use Case with SOAP Web Service to Oracle DB

 

Summary

This use case uses the REST Invoker to post a request to a SOAP based web service. The response of that request is written to file and then written to an Oracle Database.

Components Used


REST Invoker. XML DMS (source), Oracle 10g (target).

Steps

  1.  First make sure you have an Oracle Client installed and a TNS entry for the QE test DB.
  2. Next import the attached project.
  3. Next give a value for the STAGING_DIR macro. This is the location where the response from the web service will be written.
  4. The process can be executed by running the rtc_SOAP_Oracle runtime config.

Expected Results

  1.  The process should execute without any errors.
  2. The process should write one record to the Oracle database.
  3. You can verify that this record is written by looking in the "SOAPTEST" table.

Adding files to your Staging Directory on the DataCloud

Complete the following steps to setup and configure a staging directory:

  1. Login to the DataCloud.
  2. Navigate to your license integration (This is the default integration that was setup for you. The name will be the name of the product you are licensed to use. Ex: "Pervasive Data Integrator Enterprise Cloud Production").
  3. Click "View Configurations".
  4. Edit the configuration you see. There should only be one configuration.
  5. Scroll down to view the macros.
  6. Take note of your DESIGN_INSTANCE_URL and PASSWORD macros.
  7. Open your start menu on your local machine.
  8. Right-click on My Computer.
  9. Select Map Network Drive.
  10.  Select your drive letter or use the default.
  11. Enter http://<DESIGN_INSTANCE_URL>/diutil/webdav as the folder. (replacing <DESIGN_INSTANCE_URL> with the macro value you found earlier).
  12. Check the box "Login with new credentials".
  13. When prompted, enter your DataCloud username and the PASSWORD macro value you found earlier.

You will now see your cloud staging directories and be able to add/remove files and directories as needed. You should only need to map the network drive once.

Back to top