In this article series, we’ll be discussing the Sept 5-6, 2023 [1] and Sept. 11-12, 2023 [2], release updates from Snowflake, highlighting how these features tie into key costumer use cases you may be experiencing. We’ll specifically touch on how these Snowflake updates impact the development life cycle, data governance or other relevant areas specific to maintaining a healthy Snowflake environment.
Customers who want to apply machine learning (ML) models to their data can do so in a highly automated way as Snowflake now provides better model management (control and access) and the ability to initiate new runs and train new models automatically.
A “class” in Snowflake terms is similar to a class in classical object-oriented programming in that you can use it to create an instance with specific methods and functions. Fundamentally, a class is an object type, and an instance is the object.
The two available classes in Snowflake at the time of writing are ANOMALY_DETECTION and FORECASTING, which allow users to utilize built-in ML models in Snowflake using their data directly.
The new views for class instances are:
These new views allow users with appropriate role access to easily see and identify instances in the environment as well as their associated functions and procedures. This information can be used to verify the existence of instances when using continuous integration and continuous delivery (CI/CD) for data environment promotion or to create dynamic structured query language (SQL) procedures to programmatically execute functions against all desired models in the environment. Ultimately, the usefulness of the information schema is only limited to the creativity of the developer using it.
Customers who utilize third-party application programming interfaces (APIs) to process or transform data, can now do so in a highly secure and integrated way without their data having to leave Snowflake.
External network access lets developers create secure access to various APIs outside of their Snowflake environment to utilize in data transformation via Lambda functions or services like Google translate. This feature will be useful in creating a wholistic solution for a Snowflake environment where certain data-related necessities that aren't available in Snowflake by default can be integrated using said APIs. It’s important to note that this feature is available to AWS accounts outside of the Government region.
Greater security via open authorization (OAuth) and resiliency/reliability when using Snowpipe streaming helps to protect against security breaches and data loss or unavailability of data during outages.
Data replication is an important feature to utilize in any Snowflake environment that contains highly sensitive and critical information and the following features have been updated:
Although not directly related to data replication, Snowpipe streaming now offers OAuth authentication allowing applications using Snowpipe streaming to set up a more secure access point for their application instead of the previous role property approach.
Additionally, Snowpipe streaming now offers replication of streams for tables populated by the application meaning that business critical functionality with Snowpipe streaming can be easily replicated to a secondary database, minimizing risk of data loss during failovers.
This new privilege allows other roles outside of ACCOUNTADMIN to fail over a connection in disaster recovery scenarios meaning that trusted individuals (with the appropriately assigned roles) can take action during failovers without needing full account access. This flexibility allows the ACCOUNTADMIN to not have to worry about recovery scenarios.
Increasing data governance and enhancing the data governance experience creates better visibility into users, roles and policies to ensure that only the correct data is shared to internal users, external users (via data sharing) and to the Snowflake Marketplace.
Data Governance is an important aspect of a well-maintained Snowflake environment and is especially critical with companies that utilize highly sensitive information, in order to ensure that data is seen by only those who have the correct privileges. The following features have been updated:
Now generally available, the interface for data governance allows users to easily see the various policies being used in their environment, as well as the tags being used on tables and columns. Additionally, the interface provides easy-to-use drill through for straightforward access to the associated objects to manage policies and tags as needed.
The new function IS_DATABASE_ROLE_IN_SESSION takes in a literal value (database role name), or a column name and allows a user to apply various role-based masking policy updates or check to see if the role currently being used should be allowed to access the data being requested.
With this new function, providers for data sharing can feel more confident that only users with the correct privileges will see their data and that their masking and row access policies will still be applied appropriately.
With this new feature (available to Enterprise or Higher), users can now utilize a mapping table to enforce row access policies.
For example, you can define allowed states in the U.S. for various roles in a mapping table and utilize IS_ROLE_IN_SESSION to determine what states that role has access too. Then, you define another row access policy on sales data that references the mapping table, ensuring that the sales data presented matches the role the user is currently using.
Customers with large, diverse datasets that are semi-structured (JSON, AVRO, etc.) can more easily determine an approximate table structure aligned to the data, for use in profiling the data.
INFER_SCHEMA is a table function that utilizes staged files (internal or external) to determine the metadata of that file. Updates to this function include two new parameters:
These new parameters are useful when scanning over many large files but have no usage when looking at specific files (i.e., using the FILES parameter). Setting limits to the number of files and records per file will speed up the INFER_SCHEMA function but could result in loss of accuracy if the schemas are different enough across files.
Snowflake is constantly releasing updates to their platform that improve the user experience and provide companies a plethora of ways to secure, enhance and utilize their Snowflake environment to the fullest extent.
Interested in learning more? Contact one of our professionals today.
[1] 7.31 Release Notes - Sept. 5-6, 2023 | Snowflake Documentation
[2] 7.32 Release Notes - Sept. 11-12, 2023 | Snowflake Documentation