Pakistan's First Oracle Blog

How to Quantize a Model with Hugging Face Quanto

Thu, 2024-04-18 03:55

This video is a hands-on step-by-step primer about how to quantize any model using Hugging Face Quanto which is a versatile pytorch quantization toolkit.

!pip install transformers==4.35.0

!pip install quanto==0.0.11

!pip install torch==2.1.1

!pip install sentencepiece==0.2.0

model_name = "google/flan-t5-small"

import sentencepiece as spm

from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-small")

model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-small")

input_text = "Meaning of happiness is "

input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)

print(tokenizer.decode(outputs[0]))

from helper import compute_module_sizes

module_sizes = compute_module_sizes(model)

print(f"The model size is {module_sizes[''] * 1e-9} GB")

from quanto import quantize, freeze

import torch

quantize(model, weights=torch.int8, activations=None)

freeze(model)

module_sizes = compute_module_sizes(model)

print(f"The model size is {module_sizes[''] * 1e-9} GB")

input_text = "Meaning of happiness is "

input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids)

print(tokenizer.decode(outputs[0]))

Categories: DBA Blogs

How to do RAG in OpenAI GPT4 Locally with File Search

Thu, 2024-04-18 01:50

This video is a hands-on step-by-step primer about how to use RAG with Open AI File Search. OpenAI now supports RAG which means that now you can attach your own files and custom data to OpenAI assistant and talk to your documents with GPT4.

Make sure you have installed latest version of openai on your local system:

pip install openai --upgrade

also make sure to have data.txt in the same folder as your script.

from openai import OpenAI

client = OpenAI()

assistant = client.beta.assistants.create(

name="Personal Assistant",

instructions="You are an empathetic. Use you knowledge base to answer questions.",

model="gpt-4-turbo",

tools=[{"type": "file_search"}],

)

# Create a vector store caled "Personal Data"

vector_store = client.beta.vector_stores.create(

name="Personal Data",

expires_after={

"anchor": "last_active_at",

"days": 1

}

)

# Ready the files for upload to OpenAI

file_paths = ["data.txt"]

file_streams = [open(path, "rb") for path in file_paths]

# Use the upload and poll SDK helper to upload the files, add them to the vector store,

# and poll the status of the file batch for completion.

file_batch = client.beta.vector_stores.file_batches.upload_and_poll(

vector_store_id=vector_store.id, files=file_streams

)

# You can print the status and the file counts of the batch to see the result of this operation.

print(file_batch.status)

print(file_batch.file_counts)

assistant = client.beta.assistants.update(

assistant_id=assistant.id,

tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},

)

thread = client.beta.threads.create(

tool_resources={"file_search": {"vector_store_ids": [vector_store.id]}},

messages=[

{

"role": "user",

"content": "Who is Fahd Mirza?",

}

]

)

# The thread now has a vector store with that file in its tool resources.

print(thread.tool_resources.file_search)

run = client.beta.threads.runs.create_and_poll(

thread_id=thread.id, assistant_id=assistant.id

)

messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))

message_content = messages[0].content[0].text

annotations = message_content.annotations

citations = []

for index, annotation in enumerate(annotations):

message_content.value = message_content.value.replace(annotation.text, f"[{index}]")

if file_citation := getattr(annotation, "file_citation", None):

cited_file = client.files.retrieve(file_citation.file_id)

citations.append(f"[{index}] {cited_file.filename}")

print(message_content.value)

print("\n".join(citations))

Categories: DBA Blogs

How to Use LLM Function Calling Locally for Free

Wed, 2024-04-17 23:38

Function calling in AI simply means that you can call external APIs from within your AI-powered application. Whenever you read that a model can do function calling, it means that it can take a natural language query of user and convert it to a function call. Then you can execute that function call to your API endpoint to get the data, and give it to LLM as additional context and get more grounded latest response as per your application requirement.

import openai

def Get_Beach_Patrol_Info(name, location):

if 'Bondi' in name:

print(f"The beach {name} at {location} is patrolled.")

elif 'Marley' in name:

print(f"The beach {name} of {location } is unpatrolled.")

else:

print(f"Beach not found.")

def get_gorilla_response(prompt="", model="gorilla-openfunctions-v2", functions=[]):

openai.api_key = "EMPTY" # Hosted for free with from UC Berkeley

openai.api_base = "http://luigi.millennium.berkeley.edu:8000/v1"

try:

completion = openai.ChatCompletion.create(

model="gorilla-openfunctions-v2",

temperature=0.0,

messages=[{"role": "user", "content": prompt}],

functions=functions,

)

return completion.choices[0]

except:

print("error occurred.")

beach_custom_functions = [

{

'name': 'Get_Beach_Patrol_Info',

'description': 'Get name and location of beach from the body of the input text',

'parameters': {

'type': 'object',

'properties': {

'name': {

'type': 'string',

'description': 'Name of the beach'

},

'location': {

'type': 'string',

'description': 'Location where beach is located.'

}

]

beach_1_description="""Bondi Beach is an iconic destination located in Sydney, Australia,

renowned for its golden sands and lively atmosphere. It attracts surfers,

sunbathers, and tourists alike, offering stunning ocean views and a

vibrant coastal culture."""

beach_2_description="""Marley Beach (also known as Big Marley Beach) is a stunning beach in the

upper Royal National Park, on the outskirts of southern Sydney, Australia.

The beach is one of eleven beaches located within the territory of the

Royal National Park."""

beach_description=[beach_1_description,beach_2_description]

for i in beach_description:

response=get_gorilla_response(prompt=i, functions=beach_custom_functions)

func=response.message.content

print(func)

eval(func)

Categories: DBA Blogs

AnythingLLM - Any LLM, Any Document, Full Control, Full Privacy, Local, Secure

Sun, 2024-04-07 18:14

This video shows how to locally install AnythingLLM to privately and securly and remotely run any LLM with any RAG document. It all runs locally with zero required internet connectivity.

Categories: DBA Blogs

OCI Generative AI Agents

Tue, 2024-04-02 02:53

Artificial intelligence (AI) is revolutionizing our interactions with the world, presenting both opportunities and complexities for organizations embracing new AI technologies. Nowhere is this more evident than with generative AI, which integrates knowledge assimilation from various sources to automate tasks and boost human creativity and productivity. This transformative technology empowers organizations to summarize documents, create tables, generate meaningful text, develop code, and synthesize innovative ideas.

Oracle's strategic approach revolves around three AI modalities: infrastructure, models and services, and application integration. Oracle has embedded AI across its cloud applications like CRM, ERP, HCM, CX, and EMR to enhance their functionalities. Similarly, Oracle is introducing generative AI capabilities into its database portfolio, akin to its integration of machine learning (ML) features into Oracle Database service and MySQL HeatWave.

Oracle Cloud Infrastructure (OCI) Generative AI Agents harnesses the potential of large language models (LLMs) and retrieval-augmented generation (RAG), enabling users to query extensive enterprise knowledge bases. Soon, users will access real-time information via natural language interfaces for direct action.

OCI Generative AI Agents RAG service empowers employees to glean insights and answers from disparate sources within the company through natural conversations. The service matches semantic similarities between user queries and documents in a centralized embeddings vector store, consolidating data from diverse corporate systems like ERP, HCM, CX, and data lakes. Users receive results using the most current information, even from dynamic data stores, rather than static reports or outdated indexes.

At the heart of this solution is a large language models or LLMs in short, facilitating natural, conversational interactions with users. You can either use Meta's Llama model or a model from Cohere and that list is going to grow. Think of it as a knowledgeable colleague you can ask business questions in everyday language. This is complemented by an embeddings model that generates embeddings vectors for the corporate document corpus stored in the embeddings vector database.

The retrieval-augmented generation (RAG) system, comprising the vector database, embeddings model, and text generation model, synthesizes responses grounded in retrieved documents. Instead of presenting a list of documents, the AI service delivers natural language responses, supported with references and links to original sources. In the future, users will seamlessly transition from information retrieval to immediate actions like sending emails, contacting customers, or managing orders within the same interface.

Categories: DBA Blogs

Select AI Intro in Oracle Database

Sun, 2024-03-31 02:39

Select AI is probably one of the most magical AI feature in Oracle Autonomous database. I have been using it for some time now and it's feels simply like magic.

Select AI represents a cutting-edge feature within the Oracle Autonomous Database, introduced last year, that empowers users to interact with their data using natural language queries. This innovative capability enables individuals to pose questions and receive insights without needing to comprehend the underlying data storage or SQL formulation. By simply directing queries in languages like English, Spanish, French, Portuguese, and more, users can leverage the full potential of their database.

The underlying technology behind Select AI leverages advanced generative AI models, such as those developed by OpenAI or Cohere, which excel at understanding intent and generating SQL code based on natural language prompts. This approach allows developers to extend and build new natural language applications seamlessly within the SQL framework. Select AI streamlines this process by integrating with Oracle SQL, utilizing a standard SELECT statement followed by AI and the natural language query.

One of the remarkable features of Select AI is its conversational capabilities, enhanced with recent updates. Users can engage in a dialogue with the database, posing follow-up questions that build upon previous interactions. This chat history is leveraged by the AI models to provide context-aware responses, resembling a real-life conversation that facilitates deeper exploration of data insights.

Moreover, Select AI is designed to be flexible and adaptable to various business needs. It supports multiple Large Language Models (LLMs) including Cohere and OpenAI, with potential for expansion to include domain-specific models in the future. This pluggable design allows organizations to select the most suitable LLM for their specific use cases, ensuring optimal performance and alignment with business objectives.

With Autonomous Database Select AI, developers can create AI-enabled applications that comprehend users' natural language queries within a conversational context, enabling secure and efficient access to valuable insights from private data sources. This groundbreaking capability marks a significant leap forward in data analytics, harnessing the power of generative AI and natural language processing to democratize access to database insights.

Categories: DBA Blogs

How to Install OpenDevin Locally

Sat, 2024-03-30 20:00

This is a step by step easy tutorial to locally install OpenDevin which is an open-source project aiming to replicate Devin, an autonomous AI software engineer who is capable of executing complex engineering tasks and collaborating actively with users on software development projects.

Commands Used:

Pre-requisites:

Update OS:

sudo apt-get update

Install git:

sudo apt-get install git

Install Nodejs:

sudo apt-get install nodejs

Install npm:

sudo apt install npm

Install Rust:

curl --proto '=https' --tlsv1.3 https://sh.rustup.rs -sSf | sh

Install docker:

sudo apt-get update

sudo apt-get install ca-certificates curl

sudo install -m 0755 -d /etc/apt/keyrings

sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc

sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:

echo \

"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \

$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \

sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update

sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Install conda:

wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh

bash Anaconda3-2022.05-Linux-x86_64.sh

Install uvicorn:

sudo apt install uvicorn

Backend Commands:

git clone https://github.com/OpenDevin/OpenDevin.git

cd OpenDevin

conda create -n opendevin python=3.11

conda activate opendevin

docker ps

docker pull ghcr.io/opendevin/sandbox

which python

python --version

/home/ubuntu/anaconda3/envs/opendevin/bin/python -m pip install pipenv

/home/ubuntu/anaconda3/envs/opendevin/bin/python -m pipenv install -v

/home/ubuntu/anaconda3/envs/opendevin/bin/python -m pipenv shell

uvicorn opendevin.server.listen:app --port 3000

Frontend Commands:

python -m pipenv requirements > requirements.txt && python -m pip install -r requirements.txt

PYTHONPATH=`pwd` /home/ubuntu/anaconda3/envs/opendevin/bin/python opendevin/main.py -d ./workspace/ -i 100 -t "Write a bash script that prints 'hello world'"

Categories: DBA Blogs

Oracle Database Comprehensive Resource Management Strategy

Thu, 2024-03-21 21:20

The Oracle Resource Manager is a powerful tool for administrators to maintain system stability, optimize performance, and manage workload prioritization within Oracle Database environments. It provides granular control over resource allocation and ensures that database resources are utilized efficiently to meet business requirements and service level objectives.

Resource Manager is a feature that enables administrators to manage and prioritize system resources such as CPU and memory among different users, applications, sessions, and consumer groups within the database. The Resource Manager allows for the control of resource allocation based on specified policies, ensuring that critical workloads receive adequate resources while preventing resource contention and maintaining overall system performance.

This Oracle script is setting up a comprehensive resource management strategy using Oracle Resource Manager. It defines consumer groups to classify different types of workloads and establishes resource plans with directives to allocate resources and prioritize workload processing based on specified criteria like workload type, time of day, and user groups. This provided Oracle PL/SQL script demonstrates the configuration and utilization of Oracle Resource Manager to manage database resource allocation based on workload characteristics.

PROCEDURE create_consumer_group(

consumer_group IN VARCHAR2,

comment IN VARCHAR2,

cpu_mth IN VARCHAR2 DEFAULT 'ROUND-ROBIN')

The create_consumer_groups.sql script uses this procedure to create the OLTP and batch consumer groups.

CONN sys AS SYSDBA

BEGIN

DBMS_RESOURCE_MANAGER.clear_pending_area;

DBMS_RESOURCE_MANAGER.create_pending_area;

-- Create the consumer groups

DBMS_RESOURCE_MANAGER.create_consumer_group(

consumer_group => 'oltp_consumer_group',

comment => 'OLTP process consumer group.');

DBMS_RESOURCE_MANAGER.create_consumer_group(

consumer_group => 'batch_consumer_group',

comment => 'Batch process consumer group.');

DBMS_RESOURCE_MANAGER.validate_pending_area;

DBMS_RESOURCE_MANAGER.submit_pending_area;

END;

/

BEGIN

DBMS_RESOURCE_MANAGER.clear_pending_area();

DBMS_RESOURCE_MANAGER.create_pending_area();

-- Delete consumer groups.

DBMS_RESOURCE_MANAGER.delete_consumer_group (

consumer_group => 'oltp_consumer_group');

DBMS_RESOURCE_MANAGER.delete_consumer_group (

consumer_group => 'batch_consumer_group');

DBMS_RESOURCE_MANAGER.validate_pending_area;

DBMS_RESOURCE_MANAGER.submit_pending_area();

END;

/

PROCEDURE create_plan (

plan IN VARCHAR2,

comment IN VARCHAR2,

cpu_mth IN VARCHAR2 DEFAULT 'EMPHASIS',

active_sess_pool_mth IN VARCHAR2 DEFAULT 'ACTIVE_SESS_POOL_ABSOLUTE',

parallel_degree_limit_mth IN VARCHAR2 DEFAULT 'PARALLEL_DEGREE_LIMIT_ABSOLUTE',

queueing_mth IN VARCHAR2 DEFAULT 'FIFO_TIMEOUT')

PROCEDURE create_plan_directive (

plan IN VARCHAR2,

group_or_subplan IN VARCHAR2,

comment IN VARCHAR2,

cpu_p1 IN NUMBER DEFAULT NULL,

cpu_p2 IN NUMBER DEFAULT NULL,

cpu_p3 IN NUMBER DEFAULT NULL,

cpu_p4 IN NUMBER DEFAULT NULL,

cpu_p5 IN NUMBER DEFAULT NULL,

cpu_p6 IN NUMBER DEFAULT NULL,

cpu_p7 IN NUMBER DEFAULT NULL,

cpu_p8 IN NUMBER DEFAULT NULL,

active_sess_pool_p1 IN NUMBER DEFAULT NULL,

queueing_p1 IN NUMBER DEFAULT NULL,

parallel_degree_limit_p1 IN NUMBER DEFAULT NULL,

switch_group IN VARCHAR2 DEFAULT NULL,

switch_time IN NUMBER DEFAULT NULL,

switch_estimate IN BOOLEAN DEFAULT FALSE,

max_est_exec_time IN NUMBER DEFAULT NULL,

undo_pool IN NUMBER DEFAULT NULL,

max_idle_time IN NUMBER DEFAULT NULL,

max_idle_blocker_time IN NUMBER DEFAULT NULL,

switch_time_in_call IN NUMBER DEFAULT NULL)

BEGIN

DBMS_RESOURCE_MANAGER.clear_pending_area;

DBMS_RESOURCE_MANAGER.create_pending_area;

-- Create a new plan

DBMS_RESOURCE_MANAGER.create_plan(

plan => 'day_plan',

comment => 'Plan suitable for daytime processing.');

-- Assign consumer groups to plan and define priorities

DBMS_RESOURCE_MANAGER.create_plan_directive (

plan => 'day_plan',

group_or_subplan => 'oltp_consumer_group',

comment => 'Give OLTP processes higher priority - level 1',

cpu_p1 => 80,

switch_group => 'batch_consumer_group',

switch_time => 60);

DBMS_RESOURCE_MANAGER.create_plan_directive (

plan => 'day_plan',

group_or_subplan => 'batch_consumer_group',

comment => 'Give batch processes lower priority - level 2',

cpu_p2 => 100);

DBMS_RESOURCE_MANAGER.create_plan_directive(

plan => 'day_plan',

group_or_subplan => 'OTHER_GROUPS',

comment => 'all other users - level 3',

cpu_p3 => 100);

DBMS_RESOURCE_MANAGER.validate_pending_area;

DBMS_RESOURCE_MANAGER.submit_pending_area;

END;

/

BEGIN

DBMS_RESOURCE_MANAGER.clear_pending_area;

DBMS_RESOURCE_MANAGER.create_pending_area;

-- Create a new plan

DBMS_RESOURCE_MANAGER.create_plan(

plan => 'night_plan',

comment => 'Plan suitable for daytime processing.');

-- Assign consumer groups to plan and define priorities

DBMS_RESOURCE_MANAGER.create_plan_directive (

plan => 'night_plan',

group_or_subplan => 'batch_consumer_group',

comment => 'Give batch processes lower priority - level 2',

cpu_p1 => 80);

DBMS_RESOURCE_MANAGER.create_plan_directive (

plan => 'night_plan',

group_or_subplan => 'oltp_consumer_group',

comment => 'Give OLTP processes higher priority - level 1',

cpu_p2 => 100);

DBMS_RESOURCE_MANAGER.create_plan_directive(

plan => 'night_plan',

group_or_subplan => 'OTHER_GROUPS',

comment => 'all other users - level 3',

cpu_p3 => 100);

DBMS_RESOURCE_MANAGER.validate_pending_area;

DBMS_RESOURCE_MANAGER.submit_pending_area;

END;

/

BEGIN

DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();

DBMS_RESOURCE_MANAGER.CREATE_PLAN(PLAN => 'bugdb_plan',

COMMENT => 'Resource plan/method for bug users sessions');

DBMS_RESOURCE_MANAGER.CREATE_PLAN(PLAN => 'maildb_plan',

COMMENT => 'Resource plan/method for mail users sessions');

DBMS_RESOURCE_MANAGER.CREATE_PLAN(PLAN => 'mydb_plan',

COMMENT => 'Resource plan/method for bug and mail users sessions');

DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(CONSUMER_GROUP => 'Online_group',

COMMENT => 'Resource consumer group/method for online bug users sessions');

DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(CONSUMER_GROUP => 'Batch_group',

COMMENT => 'Resource consumer group/method for batch job bug users sessions');

DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(CONSUMER_GROUP => 'Bug_Maint_group',

COMMENT => 'Resource consumer group/method for users sessions for bug db maint');

DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(CONSUMER_GROUP => 'Users_group',

COMMENT => 'Resource consumer group/method for mail users sessions');

DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(CONSUMER_GROUP => 'Postman_group',

COMMENT => 'Resource consumer group/method for mail postman');

DBMS_RESOURCE_MANAGER.CREATE_CONSUMER_GROUP(CONSUMER_GROUP => 'Mail_Maint_group',

COMMENT => 'Resource consumer group/method for users sessions for mail db maint');

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => 'bugdb_plan',

GROUP_OR_SUBPLAN => 'Online_group',

COMMENT => 'online bug users sessions at level 1', MGMT_P1 => 80, MGMT_P2=> 0);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => 'bugdb_plan',

GROUP_OR_SUBPLAN => 'Batch_group',

COMMENT => 'batch bug users sessions at level 1', MGMT_P1 => 20, MGMT_P2 => 0,

PARALLEL_DEGREE_LIMIT_P1 => 8);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => 'bugdb_plan',

GROUP_OR_SUBPLAN => 'Bug_Maint_group',

COMMENT => 'bug maintenance users sessions at level 2', MGMT_P1 => 0, MGMT_P2 => 100);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => 'bugdb_plan',

GROUP_OR_SUBPLAN => 'OTHER_GROUPS',

COMMENT => 'all other users sessions at level 3', MGMT_P1 => 0, MGMT_P2 => 0,

MGMT_P3 => 100);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => 'maildb_plan',

GROUP_OR_SUBPLAN => 'Postman_group',

COMMENT => 'mail postman at level 1', MGMT_P1 => 40, MGMT_P2 => 0);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => 'maildb_plan',

GROUP_OR_SUBPLAN => 'Users_group',

COMMENT => 'mail users sessions at level 2', MGMT_P1 => 0, MGMT_P2 => 80);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => 'maildb_plan',

GROUP_OR_SUBPLAN => 'Mail_Maint_group',

COMMENT => 'mail maintenance users sessions at level 2', MGMT_P1 => 0, MGMT_P2 => 20);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => 'maildb_plan',

GROUP_OR_SUBPLAN => 'OTHER_GROUPS',

COMMENT => 'all other users sessions at level 3', MGMT_P1 => 0, MGMT_P2 => 0,

MGMT_P3 => 100);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => 'mydb_plan',

GROUP_OR_SUBPLAN => 'maildb_plan',

COMMENT=> 'all mail users sessions at level 1', MGMT_P1 => 30);

DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(PLAN => 'mydb_plan',

GROUP_OR_SUBPLAN => 'bugdb_plan',

COMMENT => 'all bug users sessions at level 1', MGMT_P1 => 70);

DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA();

DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();

END;

/

Categories: DBA Blogs

Temp Tablespace Matters in Oracle

Tue, 2024-03-19 02:28

When managing Oracle temporary tablespaces, it's important to prioritize timely data commitment or rollback to release temporary space efficiently.

Additionally, keep an eye out for sessions that remain active without progressing for an extended period; terminating these sessions may be necessary to optimize resource usage. Another crucial consideration is the allocation of small sport segments at the end of a temporary file, which can hinder successful file shrinking. To gain insights into this issue, check the High Water Mark (HWM) of the file to understand its allocation status and potential space usage implications. These practices help maintain the performance and manageability of Oracle temporary tablespaces effectively.

Following are some queries to manage these Temp tablespaces efficiently:

Use this query to get Temp tablespace usage:

SQL> select FILE#, STATUS, Bytes/1024/1024 Size, tf.name Filename, ts.name TS_NAME

from v$tempfile tf , v$tablespace ts where tf.ts# = ts.ts#;

Use this query to get allocated temp space:

SQL> SELECT a.username, a.sid, a.serial#, a.osuser, b.tablespace, b.blocks, c.sql_text

FROM v$session a, v$tempseg_usage b, v$sqlarea

WHERE a.saddr = b.session_addr AND c.address= a.sql_address

AND c.hash_value = a.sql_hash_value ORDER BY b.tablespace, b.blocks;

Use this query to get high water mark of Temp Tablespaces:

SQL>select sum( u.blocks * blk.block_size)/1024/1024 "MB. in sort segments", (hwm.max * blk.block_size)/1024/1024 "HWM"

from v$sort_usage u, (select block_size from dba_tablespaces

where contents = 'TEMPORARY') blk,

(select segblk#+blocks max from

v$sort_usage where segblk# = (select max(segblk#)

from v$sort_usage) ) hwm

group by hwm.max * blk.block_size/1024/1024;

Use this query to shrink the Temp Tablespaces:

SQL> alter tablespace temp shrink space;

Categories: DBA Blogs

How to Create Urdu Hindi AI Model and Dataset from New Dataset

Fri, 2024-03-15 21:54

This video is hands on step-by-step tutorial to create a new dataset, an AI model, fine-tune the model on dataset and then push it to hugging face.

Code:

%%capture

import torch

major_version, minor_version = torch.cuda.get_device_capability()

# Must install separately since Colab has torch 2.2.1, which breaks packages

!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

if major_version >= 8:

# Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)

!pip install --no-deps packaging ninja flash-attn xformers trl peft accelerate bitsandbytes

else:

# Use this for older GPUs (V100, Tesla T4, RTX 20xx)

!pip install --no-deps xformers trl peft accelerate bitsandbytes

pass

!pip install einops

from unsloth import FastLanguageModel

import torch

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!

dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+

load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(

model_name = "unsloth/gemma-7b-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B

max_seq_length = max_seq_length,

dtype = dtype,

load_in_4bit = load_in_4bit,

token = " ", # use one if using gated models like meta-llama/Llama-2-7b-hf

)

model = FastLanguageModel.get_peft_model(

model,

r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128

target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",

"gate_proj", "up_proj", "down_proj",],

lora_alpha = 16,

lora_dropout = 0, # Supports any, but = 0 is optimized

bias = "none", # Supports any, but = "none" is optimized

use_gradient_checkpointing = True,

random_state = 3407,

use_rslora = False, # We support rank stabilized LoRA

loftq_config = None, # And LoftQ

)

alpaca_prompt = """ذیل میں ایک ہدایت ہے جو فلم کے نام کی وضاحت کرتی ہے، اس کے ساتھ ایک ان پٹ بھی ہے جو مزید دستاویزات فراہم کرتا ہے۔ گانے کے بول لکھنے کے لیے ایک لمحہ نکالیں جو فلم کے نام کے معنی سے میل کھاتا ہے۔

### Instruction:

{}

### Input:

{}

### Response:

{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

def formatting_prompts_func(examples):

instructions = examples["urdu_instruction"]

inputs = examples["urdu_input"]

outputs = examples["urdu_output"]

texts = []

for instruction, input, output in zip(instructions, inputs, outputs):

# Must add EOS_TOKEN, otherwise your generation will go on forever!

text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN

texts.append(text)

return { "text" : texts, }

pass

from datasets import load_dataset

dataset = load_dataset("fahdmirzac/urdu_bollywood_songs_dataset", split = "train")

dataset = dataset.map(formatting_prompts_func, batched = True,)

from huggingface_hub import login

access_token = "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"

login(token=access_token)

from trl import SFTTrainer

from transformers import TrainingArguments

trainer = SFTTrainer(

model = model,

tokenizer = tokenizer,

train_dataset = dataset,

dataset_text_field = "text",

max_seq_length = max_seq_length,

dataset_num_proc = 2,

packing = False, # Can make training 5x faster for short sequences.

args = TrainingArguments(

per_device_train_batch_size = 2,

gradient_accumulation_steps = 4,

warmup_steps = 5,

max_steps = 100,

learning_rate = 2e-4,

fp16 = not torch.cuda.is_bf16_supported(),

bf16 = torch.cuda.is_bf16_supported(),

logging_steps = 1,

optim = "adamw_8bit",

weight_decay = 0.01,

lr_scheduler_type = "linear",

seed = 3407,

output_dir = "outputs",

),

)

trainer_stats = trainer.train()

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

inputs = tokenizer(

[

alpaca_prompt.format(

"دیے گئے فلم کے نام کے بارے میں ایک مختصر گیت کے بول لکھیں۔", # instruction

"کیوں پیار ہو گیا", # input

"", # output - leave this blank for generation!

)

], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 200, use_cache = True)

tokenizer.batch_decode(outputs)

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

inputs = tokenizer(

[

alpaca_prompt.format(

"دیے گئے فلم کے نام کے بارے میں ایک مختصر گیت کے بول لکھیں۔", # instruction

"رنگ", # input

"", # output - leave this blank for generation!

)

], return_tensors = "pt").to("cuda")

outputs = model.generate(**inputs, max_new_tokens = 200, use_cache = True)

tokenizer.batch_decode(outputs)

model.push_to_hub("fahdmirzac/Gemma_Urdu_Hindi_Bollywood_Songs", token = "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")

Categories: DBA Blogs

Using Claude 3 Haiku Vision with Amazon Bedrock Locally

Fri, 2024-03-15 02:58

This video is a hands-on guide as how to use vision features of Anthropic's Claude 3 Haiku AI model with Amazon Bedrock.

Code Used:

import boto3

import json

import base64

from botocore.exceptions import ClientError

bedrock = boto3.client(service_name="bedrock-runtime",region_name='us-east-1')

modelId = "anthropic.claude-3-haiku-20240307-v1:0"

accept = "application/json"

contentType = "application/json"

# prompt = "What is written in this image?"

# image_path = "./images/ab55.png"

# prompt = "How many faces are there in this image and what are the expressions of those faces?"

# image_path = "./images/expression.png"

# prompt = "Tell me a short story about this image."

# image_path = "./images/hiking.png"

prompt = "What's the location in this image?"

image_path = "./images/whereisthis.png"

with open(image_path, "rb") as image_file:

image = base64.b64encode(image_file.read()).decode("utf8")

request_body = {

"anthropic_version": "bedrock-2023-05-31",

"max_tokens": 2048,

"messages": [

{

"role": "user",

"content": [

{

"type": "text",

"text": prompt,

},

{

"type": "image",

"source": {

"type": "base64",

"media_type": "image/png",

"data": image,

},

],

}

],

}

try:

response = bedrock.invoke_model(

modelId=modelId,

body=json.dumps(request_body),

)

# Process and print the response

result = json.loads(response.get("body").read())

input_tokens = result["usage"]["input_tokens"]

output_tokens = result["usage"]["output_tokens"]

output_list = result.get("content", [])

# print("Invocation details:")

# print(f"- The input length is {input_tokens} tokens.")

# print(f"- The output length is {output_tokens} tokens.")

# print(f"- The model returned {len(output_list)} response(s):")

for output in output_list:

print(output["text"])

except ClientError as err:

print(

"Couldn't invoke Claude 3 Haiku Vision. Here's why: %s: %s",

err.response["Error"]["Code"],

err.response["Error"]["Message"],

)

raise

Categories: DBA Blogs

Create AI Agent in AWS with Boto3 Code

Thu, 2024-03-14 22:03

This video is a step-by-step tutorial with code as how to create Amazon Bedrock AI agents with boto3 in Python to integrate with Lambda.

Code used: Just use any lambda with it of your choice.

import logging

import boto3

import time

import yaml

import json

import io

from botocore.exceptions import ClientError

def create_agent(bedrock, agent_name, foundation_model, role_arn, instruction):

try:

# Create a low-level client with the service name

response = bedrock.create_agent(

agentName=agent_name,

foundationModel=foundation_model,

agentResourceRoleArn=role_arn,

instruction=instruction,

)

except ClientError as e:

logging.error(f"Couldn't create agent due to: {e}")

raise

else:

return response["agent"]

def create_agent_action_group(bedrock, name, description, agent_id, agent_version, function_arn, api_schema):

try:

response = bedrock.create_agent_action_group(

actionGroupName=name,

description=description,

agentId=agent_id,

agentVersion=agent_version,

actionGroupExecutor={"lambda": function_arn},

apiSchema={"payload": api_schema},

)

agent_action_group = response["agentActionGroup"]

except ClientError as e:

print(f"Error: Couldn't create agent action group. Here's why: {e}")

raise

else:

return agent_action_group

def prepare_agent(bedrock, agent_id):

try:

prepared_agent_details = bedrock.prepare_agent(agentId=agent_id)

except ClientError as e:

print(f"Couldn't prepare agent. {e}")

raise

else:

return prepared_agent_details

def create_agent_alias(bedrock, name, agent_id):

try:

response = bedrock.create_agent_alias(

agentAliasName=name, agentId=agent_id

)

agent_alias = response["agentAlias"]

except ClientError as e:

print(f"Couldn't create agent alias. {e}")

raise

else:

return agent_alias

def main():

# Define your parameters

bedrock = boto3.client(service_name='bedrock-agent',region_name='us-east-1')

agent_name = 'AstroAI'

foundation_model = 'anthropic.claude-v2'

role_arn = 'bedrock role arn'

instruction = 'Your task is to generate unique and insightful daily horoscopes for individuals \

based on their zodiac sign. Start by analyzing the general characteristics and common \

themes associated with each zodiac sign. Consider traits, challenges, opportunities, \

and the emotional and physical wellbeing of individuals under each sign. Use this \

understanding to create personalized, relevant, and engaging horoscopes that offer \

guidance, reflection, and encouragement for the day ahead. Ensure the horoscopes \

are varied and resonate with the unique qualities of each sign, contributing \

positively to the users day.'

# Call the create_agent function

try:

agent = create_agent(bedrock, agent_name, foundation_model, role_arn, instruction)

agent_id = agent['agentId']

print(f"Agent created successfully: {agent_id}")

except ClientError:

print("Failed to create the agent.")

time.sleep(10)

try:

with open("api_schema.yaml") as file:

api_schema=json.dumps(yaml.safe_load(file))

name="AstroGroup"

description="AI Astrologer"

agent_version="DRAFT"

function_arn="arn:aws:lambda:us-east-1::function:horoscope"

agentgroup = create_agent_action_group(bedrock, name, description, agent_id, agent_version, function_arn, api_schema)

print(agentgroup['actionGroupId'])

except ClientError as e:

print(f"Couldn't create agent action group. Here's why: {e}")

raise

time.sleep(5)

agentprepared = prepare_agent(bedrock, agent_id)

print(agentprepared)

time.sleep(20)

agentalias = create_agent_alias(bedrock, name, agent_id)

print(agentalias['agentAliasId'])

if __name__ == "__main__":

main()

Categories: DBA Blogs

Immutable Data Integrity with Oracle's Blockchain Tables

Sat, 2024-03-02 16:24

Blockchain tables in Oracle Database implement a unique data organization model, forming chains of insert-only rows. Each row, excluding the first, links back to the previous row using cryptographic hashing. This hashing method leverages row data and the hash of the preceding row to identify any unauthorized modifications through specialized verification procedures. The added capability to store user signatures significantly enhances security against tampering and unauthorized alterations.

Beyond thwarting unauthorized changes initiated by rogue or compromised insiders, blockchain tables boast powerful functionality:

Identification of illicit modifications made by circumventing Oracle Database protocols.
Detection of user impersonation and unauthorized data insertions on behalf of another user.
Safeguarding against data tampering, certifying that all data inserted into the table was done legitimately.
The chaining mechanism ensures that each row remains linked to its predecessor through cryptographic hashing. Altering any row disrupts this chain, immediately signaling tampering. However, this additional layer of security necessitates increased processing time during commit operations to chain rows effectively.

Let's explore how to create and interact with a blockchain table in Oracle Database through the following example:

-- Creating a blockchain table

SQL> CREATE BLOCKCHAIN TABLE secure_data_table (data_value VARCHAR2(128), sequence_number NUMBER)

NO DROP UNTIL 1 DAYS IDLE

NO DELETE UNTIL 16 DAYS AFTER INSERT

HASHING USING "SHA2_512" VERSION "v1";

Table created.

-- Inserting data into the blockchain table

SQL> INSERT INTO secure_data_table VALUES ('Value1', 1);

1 row created.

SQL> COMMIT;

Commit complete.

-- Querying metadata of the blockchain table

SQL> SELECT column_id, column_name, nullable, hidden_column

FROM all_tab_cols

WHERE table_name LIKE 'SECURE_DATA_%'

ORDER BY column_id;

-- Viewing blockchain table details

SQL> SELECT * FROM dba_blockchain_tables;

In this example, secure_data_table is our blockchain table, ensuring immutable data integrity and providing robust security against unauthorized changes. The SHA2_512 hashing algorithm is employed to maintain the integrity of each chained row, guaranteeing data authenticity and traceability within Oracle Database's secure environment.

Categories: DBA Blogs

Access AWS S3 Bucket from Oracle Database

Thu, 2024-02-29 16:11

If you want to access any object stored in an AWS S3 bucket from Oracle database, then dbms_cloud package is your friend, provided you are using Oracle Autonomous database.

An AWS S3 (Simple Storage Service) bucket is a scalable storage resource in the cloud provided by Amazon Web Services. It allows you to store and retrieve any amount of data from anywhere on the web. S3 buckets are highly durable, secure, and can be configured to support various use cases including website hosting, data backup, and content distribution.

The DBMS_CLOUD package in Oracle Database is a PL/SQL package that provides procedures to interact with Oracle Cloud Infrastructure services, specifically for loading data into Oracle Database from external data sources such as object storage in Oracle Cloud. It enables seamless integration between Oracle Database and cloud-based storage for data loading operations, making it easier to ingest large volumes of data into the database from cloud-based sources. The package includes procedures like COPY_DATA for copying data from cloud storage to database tables, and CREATE_CREDENTIAL for managing credentials needed to access cloud storage securely. This package simplifies data loading tasks and enhances integration capabilities between Oracle Database and Oracle Cloud services.

In order to access the object from S3 bucket, make sure you have created a database user in Oracle with proper credentials, and also you have AWS IAM credentials to access S3 bucket.

Step 1: First, you need to create a credential in Oracle Database that allows access to your AWS S3 bucket. This credential will store the AWS access key ID and secret access key securely.

BEGIN

DBMS_CLOUD.CREATE_CREDENTIAL(

credential_name => 'MY_AWS_CREDENTIAL',

username => 'AWS_ACCESS_KEY_ID',

password => 'AWS_SECRET_ACCESS_KEY'

);

END;

/

Step 2: Now, you can use the DBMS_CLOUD package to select data from a file in your AWS S3 bucket and load it into a table in Oracle Database.

DECLARE

v_data CLOB;

BEGIN

v_data := DBMS_CLOUD.GET_OBJECT(

credential_name => 'MY_AWS_CREDENTIAL',

object_uri => 'https://s3.<region>.amazonaws.com/<bucket_name>/<file_key>'

);

-- Process the retrieved data (example: print the data)

DBMS_OUTPUT.PUT_LINE(v_data);

END;

/

Enjoy.

Categories: DBA Blogs

How AI Transcription Can Make Your Next Job Interview More Effective

Thu, 2024-02-29 01:47

If you are looking to use AI Transcription during your job interview to get help from LLM, then this video is going to help you immensely.

In the rapidly evolving landscape of recruitment, the integration of Artificial Intelligence (AI) tools has been a game-changer, especially AI transcription services. These services, which convert speech into a written or electronic text document, are now being increasingly utilized during candidate interviews. This innovative approach is not just about efficiency; it's about transforming the interview process into a more insightful, fair, and strategic component of talent acquisition. Let's delve into the merits of using AI transcription in candidate interviews and how it is reshaping the future of hiring.

Enhanced Accuracy and Objectivity

One of the paramount benefits of using AI transcription is the enhanced accuracy and objectivity it brings to the recruitment process. Traditional note-taking methods during interviews are inherently flawed, prone to human error, and can be biased. AI transcription, on the other hand, captures every word as it is spoken, ensuring that nothing is missed or misinterpreted. This verbatim record allows hiring managers to review the interview with a fine-tooth comb, ensuring that decisions are made based on what was actually said, rather than on recollection or interpretation.

Improved Candidate Experience

The candidate experience is crucial in attracting top talent. AI transcription contributes to a more engaging and less intimidating interview environment. Candidates can speak freely, knowing that their responses are being accurately recorded. This level of precision in capturing responses allows candidates to feel heard and valued, improving their overall experience and perception of the company. Furthermore, the efficiency of AI transcription speeds up the interview process, enabling quicker feedback and follow-ups, which candidates appreciate.

Accessibility and Inclusivity

AI transcription makes interviews more accessible and inclusive, breaking down barriers for candidates who may be deaf or hard of hearing. By providing real-time transcription, these candidates can participate fully in the interview process, ensuring equal opportunities for all. This commitment to inclusivity not only enhances the company's reputation but also widens the talent pool by making it more diverse.

Streamlined Analysis and Decision Making

The comprehensive data captured by AI transcription allows for a more streamlined analysis of candidates' responses. Hiring teams can utilize this data to identify patterns, evaluate key competencies, and make more informed decisions. AI tools can also highlight specific skills and qualifications mentioned during the interview, making it easier to assess candidates against job requirements. This data-driven approach reduces the likelihood of cognitive biases affecting hiring decisions and helps in selecting the best fit for the role.

Enhanced Collaboration Among Hiring Teams

AI transcription facilitates better collaboration among hiring team members, especially in scenarios where not all members can attend every interview. The transcribed interviews can be easily shared, allowing team members to review and provide input at their convenience. This collaborative approach ensures a more comprehensive evaluation of candidates, drawing on diverse perspectives within the team.

Legal Compliance and Record-Keeping

In today's litigious environment, maintaining accurate records of candidate interviews can be crucial for legal compliance. AI transcription provides an exact record of the conversation, which can be invaluable in defending against claims of discrimination or bias. This level of documentation ensures that the hiring process is transparent and defensible, safeguarding the company against potential legal challenges.

Conclusion

The adoption of AI transcription in candidate interviews is more than just a nod to technological advancement; it's a strategic move towards making the recruitment process more accurate, fair, and inclusive. By enhancing the accuracy of interviews, improving candidate experience, and streamlining decision-making, AI transcription is setting a new standard in talent acquisition. As companies continue to seek competitive advantages in attracting top talent, the use of AI in recruitment processes, especially transcription, is likely to become not just common but essential.

Categories: DBA Blogs

Tutorial to Implement RAG with Gemma Model Locally

Mon, 2024-02-26 20:20

Unlock the power of Retrieval-Augmented Generation (RAG) locally using the Gemma model with our detailed step-by-step tutorial. Learn how to enhance your projects by integrating RAG for insightful document processing and AI-driven content generation. Perfect for developers, data scientists, and AI enthusiasts eager to leverage advanced NLP techniques on their own documents. No prior RAG experience required!

Commands Used:

pip install -U "transformers==4.38.1" --upgrade

pip install -q pypdf

pip install -q python-dotenv

pip install llama-index==0.10.12

pip install -q gradio

pip install einops

pip install accelerate

pip install llama-index-llms-huggingface

pip install llama-index-embeddings-fastembed

pip install fastembed

import logging

import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)

logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

from llama_index.llms.huggingface import HuggingFaceLLM

from llama_index.core import Settings

documents = SimpleDirectoryReader("/home/ubuntu/pdfs").load_data()

from llama_index.embeddings.fastembed import FastEmbedEmbedding

embed_model = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

Settings.embed_model = embed_model

Settings.chunk_size = 512

from llama_index.core import PromptTemplate

system_prompt = "You are a Q&A assistant. Your goal is to answer questions as accurately as possible based on the instructions and context provided."

query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>")

from huggingface_hub.hf_api import HfFolder

HfFolder.save_token('<your huggingface token from huggingface.co>')

import torch

llm = HuggingFaceLLM(

context_window=8192,

max_new_tokens=256,

generate_kwargs={"temperature": 0.7, "do_sample": False},

system_prompt=system_prompt,

query_wrapper_prompt=query_wrapper_prompt,

tokenizer_name="google/gemma-7b-it",

model_name="google/gemma-7b-it",

device_map="auto",

tokenizer_kwargs={"max_length": 4096},

model_kwargs={"torch_dtype": torch.float16}

)

Settings.llm = llm

Settings.chunk_size = 512

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

def predict(input, history):

response = query_engine.query(input)

return str(response)

import gradio as gr

gr.ChatInterface(predict).launch(share=True)

Categories: DBA Blogs

Using JSON in Oracle Database Natively

Sat, 2024-02-24 16:36

Oracle Database seamlessly integrates JSON (JavaScript Object Notation) with its robust RDBMS features, offering full support for transactions, indexing, declarative querying, and views among other functionalities.

JSON is a lightweight data interchange format widely employed for representing structured data due to its human-friendly readability and machine-friendly parsing capabilities. It utilizes key-value pairs enclosed in curly braces {}. Each key is a string followed by a colon :, and its value can be a string, number, boolean, array, object, or null. JSON arrays are ordered lists enclosed in square brackets [], while objects represent collections of key-value pairs. This versatile format finds extensive use in web development, APIs, and system-to-system data exchanges owing to its simplicity, adaptability, and clarity—a near subset of JavaScript's object literal notation.

When leveraging JSON within Oracle Database, the typical workflow involves:

Step 1: Creating a JSON-Enabled Table:

CREATE TABLE json_purchase_orders

(id VARCHAR2(32) NOT NULL PRIMARY KEY,

date_loaded TIMESTAMP WITH TIME ZONE,

po_document JSON);

Step 2: Inserting JSON Data into the Table:

INSERT INTO json_purchase_orders

VALUES (SYS_GUID(),

TO_DATE('30-DEC-2014'),

'{"PONumber": 1600,

"Reference": "ABULL-20140421",

"Requestor": "Alexis Bull",

"User": "ABULL",

"CostCenter": "A50",

"ShippingInstructions": {...},

"SpecialInstructions": null,

"AllowPartialShipment": true,

"LineItems": [...]}');

Step 3: Querying JSON Data from the Table:

SELECT po.po_document.PONumber

FROM json_purchase_orders po;

By leveraging Oracle Database's native JSON support, developers gain the flexibility to seamlessly integrate JSON data within the relational model, enabling efficient storage, retrieval, and manipulation of JSON documents alongside traditional SQL operations. This unified approach empowers enterprises to harness the best of both worlds—structured data management and JSON flexibility—within a single, powerful database environment.

Categories: DBA Blogs

Build Your Own Audio Transcription Tool with AI

Fri, 2024-02-23 22:45

In this video, you will learn how to deploy a LLM based application intro production by using Amazon Bedrock, Amazon Transcribe to summarize audio files with ASR model, Titan.

Code:

import boto3

import json

import uuid

import time

from jinja2 import Template

bedrock_runtime = boto3.client('bedrock-runtime', region_name='us-east-1')

s3_client = boto3.client(service_name='s3', region_name='ap-southeast-2')

transcribe_client = boto3.client('transcribe', region_name='ap-southeast-2')

bucket_name='<replace your bucket name here>'

#file_name = 'angry.mp3'

file_name = 'happy.mp3'

job_name = 'transcription-job-' + str(uuid.uuid4())

response = transcribe_client.start_transcription_job(

TranscriptionJobName=job_name,

Media={'MediaFileUri': f's3://{bucket_name}/{file_name}'},

MediaFormat='mp3',

LanguageCode='en-US',

OutputBucketName=bucket_name,

Settings={

'ShowSpeakerLabels': True,

'MaxSpeakerLabels': 2

}

)

while True:

status = transcribe_client.get_transcription_job(TranscriptionJobName=job_name)

if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:

break

time.sleep(2)

print(status['TranscriptionJob']['TranscriptionJobStatus'])

if status['TranscriptionJob']['TranscriptionJobStatus'] == 'COMPLETED':

# Load the transcript from S3.

transcript_key = f"{job_name}.json"

transcript_obj = s3_client.get_object(Bucket=bucket_name, Key=transcript_key)

transcript_text = transcript_obj['Body'].read().decode('utf-8')

transcript_json = json.loads(transcript_text)

output_text = ""

current_speaker = None

items = transcript_json['results']['items']

for item in items:

speaker_label = item.get('speaker_label', None)

content = item['alternatives'][0]['content']

# Start the line with the speaker label:

if speaker_label is not None and speaker_label != current_speaker:

current_speaker = speaker_label

output_text += f"\n{current_speaker}: "

# Add the speech content:

if item['type'] == 'punctuation':

output_text = output_text.rstrip()

output_text += f"{content} "

# Save the transcript to a text file

with open(f'{job_name}.txt', 'w') as f:

f.write(output_text)

with open(f'{job_name}.txt', "r") as file:

transcript = file.read()

template_string = """ I need to summarize a conversation. The transcript of the

conversation is between the <data> XML like tags.

<data>

</data>

The summary must contain a one word sentiment analysis, and

a list of issues, problems or causes of friction

during the conversation. The output must be provided in

JSON format shown in the following example.

Example output:

{

"sentiment": <sentiment>,

"issues": [

{

"topic": <topic>,

"summary": <issue_summary>,

}

]

}

Write the JSON output and nothing more.

Here is the JSON output: """

data = {

'transcript' : transcript

}

template = Template(template_string)

prompt = template.render(data)

kwargs = {

"modelId": "amazon.titan-text-express-v1",

"contentType": "application/json",

"accept": "*/*",

"body": json.dumps(

{

"inputText": prompt,

"textGenerationConfig": {

"maxTokenCount": 512,

"temperature": 0,

"topP": 0.9

}

)

}

response = bedrock_runtime.invoke_model(**kwargs)

response_body = json.loads(response.get('body').read())

generation = response_body['results'][0]['outputText']

print(generation)

Categories: DBA Blogs

Manage and Run Gemma LLM with Keras Locally

Fri, 2024-02-23 01:59

This video shows how to install and manage Gemma LLM with Keras. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

Code:

!pip install keras --upgrade

!pip install kaggle

from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():

print('User uploaded file "{name}" with length {length} bytes'.format(

name=fn, length=len(uploaded[fn])))

# Then move kaggle.json into the folder where the API expects to find it.

!mkdir -p ~/.kaggle/ && mv kaggle.json ~/.kaggle/ && chmod 600 ~/.kaggle/kaggle.json

!pip install keras_nlp --upgrade

!pip install keras --upgrade

import os

os.environ["KERAS_BACKEND"] = "jax"

import keras_nlp

import keras

import tensorflow as tf

import time

keras.mixed_precision.set_global_policy("mixed_float16")

preprocessor = keras_nlp.models.GemmaPreprocessor.from_preset(

"gemma_2b_en"

)

gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")

gemma_lm.generate("which one came first, egg or chicken?", max_length=130)

Categories: DBA Blogs

Oracle GoldenGate Script for Troubleshooting

Thu, 2024-02-22 21:28

Oracle GoldenGate is a comprehensive software product that enables real-time data integration, replication, and synchronization across heterogeneous systems. It supports continuous data streaming with minimal latency and offers high availability and data integrity. GoldenGate facilitates data replication between different database platforms, including Oracle Database, Microsoft SQL Server, MySQL, and others, allowing organizations to achieve real-time data integration for analytics, reporting, disaster recovery, and distributed data environments. Its capabilities include data transformation, filtering, conflict resolution, and support for complex replication topologies, making it a powerful solution for maintaining synchronized data across diverse IT landscapes.

This script checks if Oracle Golden Gate process is running or not and give the exact starting point to catch up:

#!/bin/bash

. /etc/profile

#. /home/oracle/.bash_profile

YYMMDD=`date +%Y%m%d`

LOG_DIR=/home/oracle/GGscript/log

WRK_DIR=/home/oracle/GGscript/work

LOG_FL=${LOG_DIR}/${ORACLE_SID}_gg_chk.log

WRK_FL=${WRK_DIR}/gg_chk_${ORACLE_SID}_gg_chk.wrk

BAD_FL=${WRK_DIR}/gg_chk_${ORACLE_SID}_gg_chk.bad

export ORACLE_HOME=/u01/app/oracle/product/11.2.0.3/db_home1

export PATH=$ORACLE_HOME/bin:$PATH/bin

export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/local/lib

DBA_MAIL_ADDR='ADD_EMAIL@HERE.com'

HOSTNAME=`uname -n | tr '[a-z]' '[A-Z]'`

{

echo "###########################################"

echo "GG Check Up `basename $0` started t `date`"

#check if maint flag is on

if [ -f /home/oracle/work/${ORACLE_SID}_maintenance.on ] ; then

echo "Maintenance flag is on .... stop "

exit 0

fi

### Check GG processes

cd /ggs

echo "info all" | /ggs/ggsci | sed -e '1,/^Program/d' -e '/^GGSCI/,$d' -e '/^ *$/d' | tee $WRK_FL

echo " "

cat $WRK_FL | grep -v 'RUNNING'| grep -iv 'test' | tee $BAD_FL

non_run_cnt=0

if [ -f $BAD_FL ] ; then

non_run_cnt=`wc -l $BAD_FL |awk '{print $1}'`

fi

echo "GG Process Not Running: $non_run_cnt "

if [[ $non_run_cnt -gt 0 ]] ; then

echo "\nEmail support team ($DBA_MAIL_ADDR) about failed processes "

cat $BAD_FL

mailx -s "$HOSTNAME : ALERT: One or more Goldengate Process in ${ORACLE_SID} are down ($non_run_cnt)!!!" $DBA_MAIL_ADDR < $BAD_FL

fi

rm -f $WRK_FL $BAD_FL $LAGWRK_FL $LAG_FL $LAG_LST

echo "GG Check Up `basename $0` finished at `date`"

echo "###########################################"

} 2>&1 | tee $LOG_FL

Categories: DBA Blogs

Site search

Site navigation

Aggregator

Pakistan's First Oracle Blog

How to Quantize a Model with Hugging Face Quanto

How to do RAG in OpenAI GPT4 Locally with File Search

How to Use LLM Function Calling Locally for Free

AnythingLLM - Any LLM, Any Document, Full Control, Full Privacy, Local, Secure

OCI Generative AI Agents

Select AI Intro in Oracle Database

How to Install OpenDevin Locally

Oracle Database Comprehensive Resource Management Strategy

Temp Tablespace Matters in Oracle

How to Create Urdu Hindi AI Model and Dataset from New Dataset

Using Claude 3 Haiku Vision with Amazon Bedrock Locally

Create AI Agent in AWS with Boto3 Code

Immutable Data Integrity with Oracle's Blockchain Tables

Access AWS S3 Bucket from Oracle Database

How AI Transcription Can Make Your Next Job Interview More Effective

Tutorial to Implement RAG with Gemma Model Locally

Using JSON in Oracle Database Natively

Build Your Own Audio Transcription Tool with AI

Manage and Run Gemma LLM with Keras Locally

Oracle GoldenGate Script for Troubleshooting

Pages

Site search

User login

Site navigation

Aggregator

You are here

Pakistan's First Oracle Blog

Pages