로봇,ROS,SLAM

OpenVLA Tutorial 01

우선 OpenVLA를 하기위해 OpenVLA 깃헙과 embodied-agent 깃헙, 그리고 SimplerEnv를 알아볼 필요가 있다.

https://github.com/openvla/openvla

GitHub - openvla/openvla: OpenVLA: An open-source vision-language-action model for robotic manipulation.

OpenVLA: An open-source vision-language-action model for robotic manipulation. - openvla/openvla

github.com

https://github.com/mbodiai/embodied-agents?tab=readme-ov-file

GitHub - mbodiai/embodied-agents: Seamlessly integrate state-of-the-art transformer models into robotics stacks

Seamlessly integrate state-of-the-art transformer models into robotics stacks - mbodiai/embodied-agents

github.com

https://github.com/simpler-env/SimplerEnv

GitHub - simpler-env/SimplerEnv: Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024) - simpler-env/SimplerEnv

github.com

https://colab.research.google.com/drive/15mElfn43Ge5Uj_OeSs47OthB3c8mmGQL?usp=sharing

OpenVLA_tutorial_01.ipynb

Colab notebook

colab.research.google.com

colab에서 실행할 수 있는 튜토리얼을 만들어 보았다.

다음은 GPU가 있는 로컬에서도 수행할 수 있게 코드를 짜보았다.

당연히 가장 먼저 OpenVLA 깃헙에서 지시해준데로 가상환경과 설치를 진행한다.

openvla폴더안에서 01.test.ipynb를 만든다.

from transformers import AutoModelForVision2Seq, AutoProcessor, BitsAndBytesConfig
from PIL import Image
import requests
import time 
import torch

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

device

필요한 라이브러리들을 임포트한다.

로컬이라도 GPU는 필요하다.

# === Verification Arguments
UNNORM_KEY = 'bridge_orig'
MODEL_PATH = "openvla/openvla-7b"
SYSTEM_PROMPT = (
    "A chat between a curious user and an artificial intelligence assistant. "
    "The assistant gives helpful, detailed, and polite answers to the user's questions."
)
INSTRUCTION = "Pick up the remote"

pretrained된 모델이 학습한 데이터 bridge_orig

모델은 허깅페이스에서 openvla/openvla-7b

시스템(AI)의 프롬프트는 SYSYEM_PROMPT

로봇에게 지시할 명령은 INSTRUCTION

def get_openvla_prompt(instruction: str) -> str:
    if "v01" in MODEL_PATH:
        return f"{SYSTEM_PROMPT} USER: What action should the robot take to {instruction.lower()}? ASSISTANT:"
    else:
        return f"In: What action should the robot take to {instruction.lower()}?\nOut:"

vla-scripts/extern/verify_openvla.py 코드를 참고하였다. 위 함수도 여기에 있어서 똑같이 사용하였다.

# Load Processor & VLA
processor = AutoProcessor.from_pretrained("openvla/openvla-7b", trust_remote_code=True)

Multimodal을 수행할 땐 두 가지의 전처리 도구가 함께 필요하다. 이미지와 텍스트를 처리하기 위해 AutoProcessor를 사용하고 processor로 정의한다.

# vla = AutoModelForVision2Seq.from_pretrained(
#     "openvla/openvla-7b", 
#     attn_implementation="flash_attention_2",  # [Optional] Requires `flash_attn`
#     torch_dtype=torch.bfloat16, 
#     low_cpu_mem_usage=True, 
#     trust_remote_code=True
# ).to("cuda:0")

# # === 8-BIT QUANTIZATION MODE (`pip install bitsandbytes`) :: [~9GB of VRAM Passive || 10GB of VRAM Active] ===
# print("[*] Loading in 8-Bit Quantization Mode")
# vla = AutoModelForVision2Seq.from_pretrained(
#     MODEL_PATH, 
#     attn_implementation="flash_attention_2",
#     torch_dtype=torch.float16,
#     quantization_config=BitsAndBytesConfig(load_in_8bit=True),
#     low_cpu_mem_usage=True,
#     trust_remote_code=True,
# )

# === 4-BIT QUANTIZATION MODE (`pip install bitsandbytes`) :: [~6GB of VRAM Passive || 7GB of VRAM Active] ===
print("[*] Loading in 4-Bit Quantization Mode")
vla = AutoModelForVision2Seq.from_pretrained(
    MODEL_PATH,
    attn_implementation="flash_attention_2",
    torch_dtype=torch.float16,
    quantization_config=BitsAndBytesConfig(load_in_4bit=True),
    low_cpu_mem_usage=True,
    trust_remote_code=True,
)

vla-scripts/extern/verify_openvla.py 코드를 보면 3가지 버전이 있다. 너무 감사하게도 낮은 사양의 GPU로도 로컬에서 확인해볼 수 있다.

4-Bit 또는 8-Bit 양자화 모드를 사용하려면 아래 라이브러리를 설치해야한다.

pip install bitsandbytes
pip install accelerate

이제 프롬프트를 정하고 이미지를 불러오자.

prompt = get_openvla_prompt(INSTRUCTION)

print(prompt)

# 이미지를 직접 불러오던가 URL로 불러오던가
# image_path = "./images/bridge_orig.jpeg"
# image = Image.open(image_path).convert("RGB")
DEFAULT_IMAGE_URL = (
    "https://api.mbodi.ai/community-models/file=/tmp/gradio/c213d531d13cdcd19391acfd08b14e629b1118063fd303d9da1f4b5e065857e4/example.jpeg"
)

image = Image.open(requests.get(DEFAULT_IMAGE_URL, stream=True).raw).convert("RGB")

import matplotlib.pyplot as plt

plt.imshow(image)
plt.axis("off")  # Hide axis for better visualization
plt.show()

# === BFLOAT16 MODE ===
# inputs = processor(prompt, image).to(device, dtype=torch.bfloat16)

# === 8-BIT/4-BIT QUANTIZATION MODE ===
inputs = processor(prompt, image).to(device, dtype=torch.float16)

양자화 모드 별로 다르게 dtype을 정하는거 잊지 말고

OpenVLA Inference를 실행한다.

# Run OpenVLA Inference
start_time = time.time()
action = vla.predict_action(**inputs, unnorm_key=UNNORM_KEY, do_sample=False)
print(f"=>> Time: {time.time() - start_time:.4f} \n Action: \n {action}")

=>> Time: 1.1822

Action:

[ 2.69897781e-03 -7.47882333e-04 8.33299030e-03 8.16284417e-03 -2.75751861e-02 -1.94082882e-02 9.96078431e-01]

X,Y,Z, Roll, Pitch, Yall, Griper가 나왔다.

'로봇,ROS,SLAM' 카테고리의 다른 글

[Genesis Part1] Genesis 알아보고 Ubuntu에서 구동해보기 (0)	2025.01.01
OpenVLA Tutorial 3 (0)	2024.11.23
OpenVLA Tutorial 2 (1)	2024.11.19
Issac Sim 설치 and ROS URDF 불러오기 (2)	2024.11.17
[ROS2] ROS2 Humble 설치 및 alias로 bashrc 설정 (0)	2024.11.17

Contents

새소식

인기 검색어

OpenVLA Tutorial 01

'로봇,ROS,SLAM' 카테고리의 다른 글

당신이 좋아할만한 콘텐츠

티스토리툴바