Jerome Rajan

0 %
Jerome Rajan
Staff Solutions Consultant at Google
Data & Analytics
  • Residence:
    India
  • City:
    Mumbai
SQL
Dataproc, EMR
Hadoop
BigQuery
AWS Glue
PySpark, Python
Data Pipeline Design
Tableau, Redshift, Snowflake
IBM DataStage
  • AWS Lambda, S3, EMR, SQS, DynamoDB, Step Functions, Cloud Functions
  • Unix Shell Scripting, Python
  • Oracle, DB2, Redis
  • Alteryx, VBA, Blueprism, UiPath
English
Tamil
Hindi
Malayalam
Marathi

GCP Cloud Logging : How to Enable Data Access Audit For Selected Buckets

October 4, 2022

Also published at https://medium.com/google-cloud/gcp-cloud-logging-how-to-enable-data-access-audit-for-selected-buckets-aaec12556486

Introduction

Data Access Audit Logs are used to trace and monitor API calls that create, modify or read user data or metadata on any GCP resource. These logs are disabled by default (except for BigQuery Data Access audit logs) because these Data Access Audit can cause an explosion in the volume of logs being generated that in turn can shoot up usage costs.

Click Here learn how to enable data access audit logs. You will notice that the workflow allows you to select the type of audit log to enable (Admin Read, Data Read and Data Write) and also allows you to specify principles to exempt from the logging. For e.g. you may have a Dataproc workload that reads from a GCS bucket as part of a schedule. If you don’t want to enable data access auditing for the service account used by Dataproc, then specify it under exempted principles.

But that is all you can configure while enabling data access audit logs. What if you want to exempt certain buckets from the access audits? In this post, we will look at how to enable data access audit for GCS buckets within a project but also exclude certain buckets within the same project from being audited.

This additional layer of filtering will help keep logging costs under control.

A Quick Overview On How Cloud Logging Works

All logging activities in Google Cloud Platform (GCP) are routed through the Logging API. The Logging API uses a robust architecture to deliver the logs you need – reliably and on-time.

Log entries are sent to the Logging API which then pass through a Log Router. Log Routers contain sinks that define “Which” logs need to be sent “Where”.

If the sinks are not configured properly, logging could become one of the highest components that you get billed for. This is especially true if you have data access audit logging enabled.

Solution

  • You will see 2 sinks already created and defined for you: _Default and _Required
  • _Required is the sink that is used to route Admin Activity logs, System Audit logs and Access Transparency Logs. This sink cannot be edited nor disabled. You do not incur any charges for logs routed through this sink.
  • _Default is the sink that routes all other logs including data access audit logs. This sink can be edited and also disabled. Click on the 3-dot dropdown and select “Edit Sink”
  • Look for the section titled — Choose logs to filter out of sink. This is where you can define which logs you want the router to filter out and not send to the destination specified in the sink.
  • The exclusion filter (and also the inclusion filter) needs to be defined using the Logging Query Language
  • In the above example, I have defined a simple exclusion rule that filters out data access logs arising out of API activity in a bucket named — demo-bucket.
resource.type="gcs_bucket"
resource.labels.bucket_name="demo-bucket"
  • Detailed documentation on the Logging Query Language can be found HERE
  • If you prefer not to mess around with the pre-defined _Default sink, you can create a new sink and specify your custom rules and conditions there. Ensure that the _Default sink is disabled otherwise logs will be routed to both destinations resulting in an increase in costs.
  • The Logging Query Language allows you to query logs not just at a resource level but also at an operations level provided the metadata is defined in the payload. For e.g., you can use the below query to exclude all list operations on a bucket named — demo-bucket
resource.type="gcs_bucket"
resource.labels.bucket_name="demo-bucket"
protoPayload.methodName="storage.objects.list"

Things to Remember

  • Log entries excluded from a sink will continue to use the entries.write API quota since filtering happens after the Logging API receives the entry
  • Exclusion filters take precedence over inclusion filters. So if any log entry overlaps an Exclusion and an Inclusion filter, the entry will get excluded regardless.
  • If no filters are specified, then all logs are routed by default.
  • When you exclude a log entry, it doesn’t incur ingestion charges nor storage charges.
Posted in TechnologyTags:
Write a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Be Original
Would the boy you were be proud of the man you are?