Cloud Inventory Monitoring & More

As a DevOps Engineer I receive constant requests for checking different cloud resources. "What does our current infrastructure look like? What are the access policies for X resources?" Do to this I am always looking for the best solution for tracking different items within our cloud environments. As someone that spends their days between AWS and Azure, it can get frustrating trying to keep up with all of the changes that are going on.

While looking for solutions for tracking between these two systems I stumbled across CloudQuery. An amazing tool with a simple setup that allows ease of tracking your cloud infrastructure, and present data in a database that can easily be visualized within your own BI stack.

The Initial Setup

I do a lot of my work in WSL2 so these instructions will be for Linux, but MacOS and Windows are both available. This guide assumes that you have Docker and AWS CLI installed.

  • Create a new directory for CloudQuery, and download the executable.
mkdir ./cloudquery
cd cloudquery
curl -L -o cloudquery
chmod a+x cloudquery
  • Stand up our docker container for PostgreSQL. We will be using anonymous volumes to persist our data across container restarts.
docker run -d \
	--name cloudquery-pg \
	-e POSTGRES_PASSWORD=supersecuresecretpassword \
	-e PGDATA=/var/lib/postgresql/data/pgdata \
	-v CloudQuery-PG:/var/lib/postgresql/data \
	-p 5432:5432 \
  • Create a new cloudquery-config directory and create your postgresql.yml file.
  • Make sure to update the postgres connection string to match your connection string for postgres
# cloudquery/cloudquery-config/postgresql.yml
kind: destination
  ## Required. name of the plugin.
  ## This is an alias so it should be unique if you have a number of postgresql destination plugins.
  name: "postgresql"
  ## Optional. Where to search for the plugin. Default: "github". Options: "github", "local", "grpc".
  # registry: "github"
  ## Path for the plugin.
  ## If registry is "github" path should be "repo/name"
  ## If registry is "local", path is path to binary. If "grpc" then it should be address of the plugin (usually useful in debug).
  path: "cloudquery/postgresql"
  ## Required. Must be a specific version starting with v, e.g. v1.2.3
  ## checkout latest versions here
  version: "v2.0.8"
  ## Optional. Default: "overwrite-delete-stale". Available: "overwrite-delete-stale", "overwrite", "append". 
  ##  Not all modes are supported by all plugins, so make sure to check the plugin documentation for more details.
  write_mode: "overwrite-delete-stale" # overwrite-delete-stale, overwrite, append
    ## plugin-specific configuration for PostgreSQL.
    ## See all available options here:
    ## Required. Connection string to your PostgreSQL instance
    ## In production it is highly recommended to use environment variable expansion
    ## connection_string: ${PG_CONNECTION_STRING}
    connection_string: "postgresql://postgres:pass@localhost:5432/postgres?sslmode=disable"
  • In this we are going to focus on connecting to AWS, there are other sources that can be pulled from and information on those can be found at CloudQuery Plugins | CloudQuery.
  • Create our aws.yml in our config folder.
kind: source
  # Source spec section
  name: aws
  path: cloudquery/aws
  version: "v13.0.0"
  tables: ["*"]
  destinations: ["postgresql"]
    # AWS Spec section described below
      # Update this to your regeions
      - us-east-1
      # This must match your credentials in ~/.aws/credentials
      - id: "account1"
        local_profile: "account1"
    aws_debug: false

Once all of this has been setup, all that is need is to run the  program. This will create your database Schema, and begin syncing your resources from AWS to your PostgreSQL database. This data can now be accessed and used to build reports. A full outline of the table structure can be found at Source Plugin: aws | CloudQuery.

cloudquery sync ./cloudquery-config

# or cloudquery sync cloudquery-config/aws.yml cloudquery-config/postgresql.yml

You can now connect to your database and query for items. Some sample queries for looking at your resources that have been synced to the database.

Find all public-facing load balancers

SELECT * FROM aws_elbv2_load_balancers WHERE scheme = 'internet-facing';

Find all unencryped RDS instances

SELECT * FROM aws_rds_clusters WHERE storage_encrypted IS FALSE;

Find all S3 buckets that are permitted to be public

SELECT arn, region
FROM aws_s3_buckets
WHERE block_public_acls IS NOT TRUE
    OR block_public_policy IS NOT TRUE
    OR ignore_public_acls IS NOT TRUE
    OR restrict_public_buckets IS NOT TRUE

Where do I go from here?

There are numerous other sources to be pulled from. Each has their own config file, and how they need to be setup. If you don't like PostgreSQL there are other destination options that can be found in the docs Destinations | CloudQuery.

Start by connecting to your data sources from your BI platform and begin building visual reports for the less technical. Give your security team access, and they can see what policies are applied to your EC2 instances, what S3 buckets are public, and so much more.

Nathanial Wilson

Nathanial Wilson