Setup LLM model for GCP

We take data security very seriously. Your code will sit on your premises and go to a model that you control, sitting in your cloud.

Part 1: Getting access to model
  1. Go to URL: https://console.cloud.google.com/vertex-ai/publishers/anthropic/model-garden/claude-3-haiku

    • Select the project you want to install your LLM from top-left

  1. Click Enable button to request access to this model

  1. Fill out the form with relevant information. You’ll also have to select a billing account as part of this process.

  1. Once the final step is completed on above form, you’ll see a success modal. Now you should have access to Claude 3 Haiku model with-in few minutes.

Part 2: Increasing quota & tokens limit of your model
  1. Increasing quota limit

Please note that, depending on your organization contract with Google, you might need to increase the quota for Claude 3 Haiku model. We expect the Quota = 250. Please visit this URL to check / request quota increase: https://console.cloud.google.com/iam-admin/quotas

  1. In case you don’t see any "Quotas" in the list, go back to your Model Garden page for Claude and click Open Notebook CTA.

  2. Click Enable on the modal to enable Vertex AI API. Once done, you should see “Enabled” for Vertex AI API. This might take a few seconds to a minute.

  1. Once Enabled, you’ll be asked to Confirm the action. Hit Confirm.

  1. Now go back to the Quota page.

  2. You should be now seeing a few “Services”

  3. From the Filter CTA, search: base_model:anthropic-claude-3-haiku

  1. Select the Service you want to edit the quota of for the region of your choice by clicking Edit quotas CTA on top-right. Select europe-west4, Click Edit Quotas and set the quota to 250.

  1. Increasing tokens limit
  1. Increasing tokens limit

You'll also need to increase Tokens/min for your model. This has to be requested via support ticket and needs to be set at 1 million tokens/min for Claude 3 Haiku for europe-west4. Follow the steps highlighted below to raise support ticket.

  1. Go to https://console.cloud.google.com/support/createcase/ to create a support ticket

  1. In "Select your product" type and select Vertex AI Other

  1. In "Describe your issue", Hitting HTTP 429 error on Vertex API for region: europe-west4 model: anthropic-claude-3-haiku even when usage is below quota limit

  2. Select priority as: P2 High Imapct

  3. Hit Next CTA to describe your issue in the next step

  1. Skip the Resources step by hitting Next CTA

  1. In detailed description step:

    • Select the product sub-category as Other

  • Add your project ID. You'll find it in your project dropdown from top-left of nav

  • In "Observed error message" add this:

{"code":429,"message":"Quota exceeded for aiplatform.googleapis.com/online_prediction_tokens_per_minute_per_base_model with base model: anthropic-claude-3-haiku. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.","status":"RESOURCE_EXHAUSTED"}

{"code":429,"message":"Quota exceeded for aiplatform.googleapis.com/online_prediction_tokens_per_minute_per_base_model with base model: anthropic-claude-3-haiku. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.","status":"RESOURCE_EXHAUSTED"}

{"code":429,"message":"Quota exceeded for aiplatform.googleapis.com/online_prediction_tokens_per_minute_per_base_model with base model: anthropic-claude-3-haiku. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.","status":"RESOURCE_EXHAUSTED"}

  • In "Provide more details" input, add this description for a faster workflow. Feel free to add any additional information if you like.

Please Review https://cloud.google.com/support/docs/best-practice Please Provide the following information for troubleshooting: - Project ID: Your Project ID - Objective (what would you like to achieve?): I'd like for the 429 errors to be resolved even though my usage is way below available quota - What is the business impact you are facing? Facing severe impact as most of our Vertex API calls are failing. This is potentially fatal. - Resource Type: Vertex AI Model Garden Claude 3 Haiku - Resource ID: claude-3-haiku@20240307 - Details of any failed request(s) - (e.g. URL or API method, date and time, input parameters, response code, error message, screenshot) {"code":429,"message":"Quota exceeded for aiplatform.googleapis.com/online_prediction_tokens_per_minute_per_base_model with base model: anthropic-claude-3-haiku. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.","status":"RESOURCE_EXHAUSTED"} - What was the observed behavior? Failure of API request with above error - What was the expected behavior? Successful API request - If there is a successful request, please provide one for comparison purposes. - Any other relevant details about your implementation or issue?

Please Review https://cloud.google.com/support/docs/best-practice Please Provide the following information for troubleshooting: - Project ID: Your Project ID - Objective (what would you like to achieve?): I'd like for the 429 errors to be resolved even though my usage is way below available quota - What is the business impact you are facing? Facing severe impact as most of our Vertex API calls are failing. This is potentially fatal. - Resource Type: Vertex AI Model Garden Claude 3 Haiku - Resource ID: claude-3-haiku@20240307 - Details of any failed request(s) - (e.g. URL or API method, date and time, input parameters, response code, error message, screenshot) {"code":429,"message":"Quota exceeded for aiplatform.googleapis.com/online_prediction_tokens_per_minute_per_base_model with base model: anthropic-claude-3-haiku. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.","status":"RESOURCE_EXHAUSTED"} - What was the observed behavior? Failure of API request with above error - What was the expected behavior? Successful API request - If there is a successful request, please provide one for comparison purposes. - Any other relevant details about your implementation or issue?

Please Review https://cloud.google.com/support/docs/best-practice Please Provide the following information for troubleshooting: - Project ID: Your Project ID - Objective (what would you like to achieve?): I'd like for the 429 errors to be resolved even though my usage is way below available quota - What is the business impact you are facing? Facing severe impact as most of our Vertex API calls are failing. This is potentially fatal. - Resource Type: Vertex AI Model Garden Claude 3 Haiku - Resource ID: claude-3-haiku@20240307 - Details of any failed request(s) - (e.g. URL or API method, date and time, input parameters, response code, error message, screenshot) {"code":429,"message":"Quota exceeded for aiplatform.googleapis.com/online_prediction_tokens_per_minute_per_base_model with base model: anthropic-claude-3-haiku. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.","status":"RESOURCE_EXHAUSTED"} - What was the observed behavior? Failure of API request with above error - What was the expected behavior? Successful API request - If there is a successful request, please provide one for comparison purposes. - Any other relevant details about your implementation or issue?

  • Once done, hit Submit to raise support ticket.

Once you get your Tokens/min updated to 1 million tokens/min, move to the next step.

Part 3: Accessing your p0 instance

Once you've modified the quota and token limit, you can now go to your VM instance page and find the Project ID to submit in our product's onboarding. Continue the rest of the onboarding as directed.

© 2024 p

0

. All rights reserved.

© 2024 p

0

. All rights reserved.

© 2024 p

0

. All rights reserved.