Each workflow execution will result in output files (up to one output file per extractor). These will often be in CSV files and you can download them directly from the platform to your computer:

Clicking on the link will initiate a download to your local computer. You can also reveal a public URL to the file:

The downloadable link will look something like this:

https://stevesie-assets.nyc3.cdn.digitaloceanspaces.com/workflow-results/prod/<SECURE_ID>/Twitter_User_Followers_-_Most_Recent_Tweet.csv

Where each link has a different <SECURE_ID> to only allow you to access the link.

Uploading to Google Cloud

Google Cloud offers Cloud Storage and allows you to upload files from your local computer to the cloud.

If you only have a few files you need to upload, you can manually download them to your computer and then upload them via the Cloud Console per the Google Cloud Uploading Instructions.

However, this can only work for a few files and we'll discuss how we can automate this process for running workflows at scale.

Programmatically Download & Upload Workflow Results

If you run a lot of executions, you'll end up having a lot of files produced and it's a pain to go into the interface and manually download everything. We can use the Stevesie API to access our results files, download them locally, and then upload to Google Cloud.

import os
import tempfile

import requests

from pathlib import Path

from google.cloud import storage

API_TOKEN = 'XXX'  # https://stevesie.com/cloud/account
WORKFLOW_ID = 'XXX'  # https://stevesie.com/cloud/workflows/XXX
LOCAL_FOLDER = '~/Desktop/workflow_results'
GOOGLE_BUCKET_NAME = 'XXX'

URL_BASE = 'https://stevesie.com/cloud/api/v1'

target_path = os.path.expanduser(LOCAL_FOLDER)
Path(target_path).mkdir(parents=True, exist_ok=True)

auth_headers = {
  'Token': API_TOKEN,
}

# https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-code-sample
def upload_blob(bucket_name, source_file_name, destination_blob_name):
    """Uploads a file to the bucket."""
    # bucket_name = "your-bucket-name"
    # source_file_name = "local/path/to/file"
    # destination_blob_name = "storage-object-name"

    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    blob.upload_from_filename(source_file_name)

    print(
        "File {} uploaded to {}.".format(
            source_file_name, destination_blob_name
        )
    )

# get the finished executions
finished_executions = requests.get(
  '{}/workflows/{}/executions'.format(URL_BASE, WORKFLOW_ID),
  params={
    'status': 'finished',
  },
  headers=auth_headers,
).json()['objects']

for execution in finished_executions:
    print(execution)
    execution_name = execution['execution_name'] or execution['id']
    results = execution['results']

    for result in results:
        remote_url = result['url']
        filename = '{}_{}'.format(
            execution_name,
            remote_url.split('/')[-1],
        )
        local_filepath = os.path.join(target_path, filename)
        with requests.get(result['url'], stream=True) as r, \
                open(local_filepath, 'wb') as f:
            r.raise_for_status()
            for chunk in r.iter_content(chunk_size=8192):
                if chunk:
                    f.write(chunk)

        # upload to google cloud
        upload_blob(GOOGLE_BUCKET_NAME, local_filepath, filename)

The above Python code will download all of the execution files from the workflow ID you provide to your local machine, and then upload each file to your Google Cloud storage system.

After the script is done, the files will remain on your local machine and you can delete them if you'd like, or keep them for other purposes.

Did this answer your question?