高度な音声認識

音声認識API

業界をリードするパフォーマンスで日本語の高精度音声テキスト変換。日本語音声に最適化された精度とスピードで音声をテキストに変換します。

API Docs 価格を見る

業界をリードする精度

実際の日本語音声でベンチマークされたパフォーマンス

98.5%

総合精度

クリーンな音声環境

95.2%

騒音環境

背景ノイズ処理

<0.5秒

レスポンスタイム

1分間の音声あたり

97.8%

混合言語

日英コードスイッチング

ドメイン別精度比較

カスタマーサービス通話96.5%

ビジネスミーティング97.2%

医療相談95.8%

法的手続き98.1%

技術的な議論96.9%

日本語音声向けに構築

日本語音声認識専用に設計された機能

複数方言サポート

標準日本語、関西弁、東北弁、その他の地域方言を正確に認識

リアルタイムストリーミング

リアルタイムで音声ストリームを処理し、ライブ文字起こしと即座の結果を提供

話者ダイアライゼーション

会話内の複数の話者を自動的に識別して分離

超高速

最適化された推論パイプラインで数時間の音声を数分で処理

エンタープライズセキュリティ

エンドツーエンド暗号化とセキュアな音声処理でSOC 2準拠

カスタム語彙

業界固有の用語、ブランド名、カスタムフレーズを追加して精度を向上

信頼されるユースケース

企業がASR APIをどのように活用しているかをご覧ください

コールセンター文字起こし

品質保証、コンプライアンス、インサイトのためにカスタマーサービスの通話を自動的に文字起こし。

品質監視
コンプライアンス記録
エージェントトレーニング
顧客センチメント分析

会議メモ

会議、インタビュー、ディスカッションを検索可能で実用的なテキスト文書に変換。

ビジネスミーティング
インタビュー記録
会議録音
チームスタンドアップ

字幕とキャプション

ビデオ、ライブストリーム、放送用の正確な字幕をリアルタイムまたはバッチモードで生成。

ビデオ字幕
ライブイベントキャプション
放送文字起こし
アクセシビリティ準拠

自分で試してみてください

このインタラクティブなデモで音声認識APIを体験してください。音声を記録してリアルタイムで文字起こしを確認します。

音声認識デモ

オーディオを録音して、すぐにテキスト化

💡 ヒント：1～10秒の間で録音してください。停止後、自動的にテキスト化が開始されます。

API Key

APIキーを設定

以下にAPIキーを入力すると、このページのすべてのコード例が自動的に更新されます

Get an API Key →

クイックスタートガイド

3つの簡単なステップで音声認識APIを開始

1. APIキーを取得

Shisa AIアカウントにサインアップし、開発者ダッシュボードからAPIキーを取得します。Authorizationヘッダーに'shsk:'プレフィックスを付けて含めます:

Authorization: Bearer shsk:YOUR_API_KEY

2. 音声を準備

APIは様々な形式のbase64エンコードされた音声を受け付けます。サポートされている音声形式は次のとおりです:

OGG（Opus、Vorbis）
WAV（PCM、16ビット）
MP3、WebM、M4A、FLAC

3. 最初のリクエストを送信

音声データと設定を含むPOSTリクエストをAPIエンドポイントに送信します。cURLを使用した基本的な例:

curl --location 'https://api.shisa.ai/asr/srt/audio_llm' \
  --header 'Authorization: Bearer shsk:YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "shisa-ai/shisa-asr-v0.1b",
    "temperature": 0.0,
    "top_p": 0.85,
    "frequency_penalty": 0.5,
    "repetition_penalty": 1.05,
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Transcribe the Japanese audio clip into text. Here are hotwords that may appear: Shisa AI, API, 音声認識"
          },
          {
            "type": "audio_url",
            "audio_url": {
              "url": "data:audio/ogg;base64,'$(base64 -w0 audio.ogg)'"
            }
          }
        ]
      }
    ]
  }'

Recommended starting parameters

Start with these values before tuning for your use case:

temperature: 0.0top_p: 0.85frequency_penalty: 0.5repetition_penalty: 1.05

Expected Response

The API will return a JSON response with the transcribed text in the message content.

{
  "language": "auto",
  "text": "祖母はおおむね機嫌よくさえころころがしている。"
}

APIエンドポイント

音声認識APIは、最大限の柔軟性とコンテキスト認識のためにチャット形式のインターフェースを使用します

音声認識エンドポイント

POSThttps://api.shisa.ai/asr/srt/audio_llm

このマルチモーダルエンドポイントは、テキストの指示と音声コンテンツの両方を受け付け、精度向上のためにコンテキストとカスタム語彙（ホットワード）を提供できます。

リクエストパラメータ

これらのパラメータで文字起こしリクエストを設定

リクエストボディパラメータ

パラメータ	型	必須	説明
model	string	Required	モデル識別子。現在: "shisa-ai/shisa-asr-v0.1b"
messages	array	Required	マルチモーダルコンテンツ（テキスト+音声）を含むメッセージオブジェクトの配列
temperature	float	Required	サンプリング温度（0.0-2.0）。低い値は出力をより決定的にします。デフォルト: 0.0 Recommended: `0.0`
top_p	float	Required	ニュークレアスサンプリングパラメータ（0.0-1.0）。出力の多様性を制御。デフォルト: 0.85 Recommended: `0.85`
frequency_penalty	float	Required	頻出トークンにペナルティ（-2.0〜2.0）。繰り返しを減らします。デフォルト: 0.5 Recommended: `0.5`
repetition_penalty	float	Required	トークンの繰り返しにペナルティ（1.0-2.0）。1.0より大きい値は繰り返しを抑制。デフォルト: 1.05 Recommended: `1.05`

メッセージ形式

メッセージ配列には、2つのコンテンツアイテムを含む単一のユーザーメッセージが含まれている必要があります:

テキストコンテンツ
言語と出現する可能性のあるホットワードを含む文字起こしの指示
音声コンテンツ
base64エンコードされたデータURLとしての音声データ

音声入力形式

音声は次の形式のbase64エンコードされたデータURLとして提供する必要があります:

data:audio/ogg;base64,SGVsbG8gV29ybGQ...

サポートされている音声形式:

Format	MIME Type	Detection
WAV	audio/wav	RIFF header
OGG	audio/ogg	OggS header
MP3	audio/mpeg	ID3 tag or MPEG sync bytes
FLAC	audio/flac	fLaC header
WebM	audio/webm	EBML header
M4A/MP4	audio/mp4	ftyp box

音声をBase64にエンコード

次のコマンドを使用して音声ファイルをbase64に変換します:

# WAV
data:audio/wav;base64,$(base64 -w0 audio.wav)

# OGG
data:audio/ogg;base64,$(base64 -w0 audio.ogg)

# MP3
data:audio/mpeg;base64,$(base64 -w0 audio.mp3)

# FLAC
data:audio/flac;base64,$(base64 -w0 audio.flac)

# WebM
data:audio/webm;base64,$(base64 -w0 audio.webm)

# M4A
data:audio/mp4;base64,$(base64 -w0 audio.m4a)

ホットワードとカスタム語彙

テキストプロンプトにホットワードを含めることで、専門用語の文字起こし精度を向上させます。ホットワードは、モデルが次のものを正しく認識するのに役立ちます:

"Transcribe the Japanese audio. Here are hotwords that may appear: Shisa AI, API, 音声認識, 機械学習"

ホットワードを使用するタイミング:

固有名詞: 企業名、人名、ブランド名
専門用語: 業界固有の専門用語、製品名
外来語: 対象言語で馴染みのない用語

Supported Languages (LID)

The API supports automatic language identification (LID) for the following languages. The detected language is returned in the language field of the response.

Primary Languages

jaJapanese

enEnglish

zhChinese

レスポンス形式

APIレスポンス構造の理解

成功レスポンス

{
  "language": "auto",
  "text": "祖母はおおむね機嫌よくさえころころがしている。"
}

レスポンスフィールド:

language: The detected or specified language of the audio
text: The transcribed text from the audio

エラー処理

一般的なエラーと解決方法

エラーレスポンス形式

{
  "code": 400,
  "error": "No audio data provided"
}

401 Authentication Error

Returned when the API key is missing, invalid, or expired. Check that your Authorization header includes a valid token.

{
  "context": ["authMiddleware"],
  "code": 104,
  "name": "ErrAuthenticationFailed",
  "error": "Authentication error: Invalid token"
}

Error Codes

Code	Cause	Error Message
400	No audio_url in messages	No audio_url found in messages
400	Audio decodes to empty	No audio data provided
400	URL doesn't start with data:	audio_url must be a data: URL
400	Missing comma in data URL	Invalid data URL format
400	Not base64 encoded	audio_url must be base64 encoded
400	Base64 decode fails	Invalid base64 audio data
500	Services not ready	Transcription service not available
500	Backend failure	Transcription failed: ...

コード例

人気のあるプログラミング言語での統合例

cURL - クイックスタート

cURLを使用して音声ファイルを文字起こしする基本的な例

curl --location 'https://api.shisa.ai/asr/srt/audio_llm' \
  --header 'Authorization: Bearer shsk:YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "model": "shisa-ai/shisa-asr-v0.1b",
    "temperature": 0.0,
    "top_p": 0.85,
    "frequency_penalty": 0.5,
    "repetition_penalty": 1.05,
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Transcribe the Japanese audio clip into text. Here are hotwords that may appear: Shisa AI, API, 音声認識"
          },
          {
            "type": "audio_url",
            "audio_url": {
              "url": "data:audio/ogg;base64,'$(base64 -w0 audio.ogg)'"
            }
          }
        ]
      }
    ]
  }'

Python - 完全な例

base64エンコードとホットワードサポートを含む完全なPython関数

import base64
import json
import requests

def transcribe_audio(audio_path, language="Japanese", hotwords=None):
    """Transcribe audio file using Shisa AI ASR API"""

    # Read and encode audio file
    with open(audio_path, 'rb') as audio_file:
        audio_data = base64.b64encode(audio_file.read()).decode('utf-8')

    # Determine audio format from file extension
    audio_format = audio_path.split('.')[-1]

    # Build hotwords text
    hotwords_text = ""
    if hotwords:
        hotwords_text = f"Here are hotwords that may appear: {', '.join(hotwords)}"

    # Prepare request
    url = "https://api.shisa.ai/asr/srt/audio_llm"
    headers = {
        "Authorization": "Bearer shsk:YOUR_API_KEY",
        "Content-Type": "application/json"
    }

    payload = {
        "model": "shisa-ai/shisa-asr-v0.1b",
        "temperature": 0.0,
        "top_p": 0.85,
        "frequency_penalty": 0.5,
        "repetition_penalty": 1.05,
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": f"Transcribe the {language} audio clip into text. {hotwords_text}"
                    },
                    {
                        "type": "audio_url",
                        "audio_url": {
                            "url": f"data:audio/{audio_format};base64,{audio_data}"
                        }
                    }
                ]
            }
        ]
    }

    # Make request
    response = requests.post(url, headers=headers, json=payload)
    response.raise_for_status()

    return response.json()

# Example usage
result = transcribe_audio(
    "meeting_recording.ogg",
    language="Japanese",
    hotwords=["Shisa AI", "API", "音声認識"]
)

print(result)

JavaScript - ブラウザ統合

FileReader APIを使用したクライアント側JavaScript例

async function transcribeAudio(audioFile, language = 'Japanese', hotwords = []) {
  // Read file and convert to base64
  const fileBuffer = await audioFile.arrayBuffer();
  const base64Audio = btoa(
    new Uint8Array(fileBuffer).reduce(
      (data, byte) => data + String.fromCharCode(byte),
      ''
    )
  );

  // Get audio format from file type
  const audioFormat = audioFile.type.split('/')[1];

  // Build hotwords text
  const hotwordsText = hotwords.length > 0
    ? `Here are hotwords that may appear: ${hotwords.join(', ')}`
    : '';

  // Prepare request
  const response = await fetch('https://api.shisa.ai/asr/srt/audio_llm', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer shsk:YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'shisa-ai/shisa-asr-v0.1b',
      temperature: 0.0,
      top_p: 0.85,
      frequency_penalty: 0.5,
      repetition_penalty: 1.05,
      messages: [
        {
          role: 'user',
          content: [
            {
              type: 'text',
              text: `Transcribe the ${language} audio clip into text. ${hotwordsText}`
            },
            {
              type: 'audio_url',
              audio_url: {
                url: `data:audio/${audioFormat};base64,${base64Audio}`
              }
            }
          ]
        }
      ]
    })
  });

  if (!response.ok) {
    throw new Error(`API request failed: ${response.status}`);
  }

  return await response.json();
}

// Example usage with file input
document.querySelector('#audioInput').addEventListener('change', async (e) => {
  const file = e.target.files[0];
  if (file) {
    const result = await transcribeAudio(
      file,
      'Japanese',
      ['Shisa AI', 'API', '音声認識']
    );
    console.log('Transcription:', result);
  }
});

音声を精密にテキストに変換

月間180分（3時間）の無料文字起こしから始められます。成長に合わせてスケール。

今すぐ始める料金プランを見る

音声認識API

業界を​リードする​精度

日本語音声向けに​構築

信頼される​ユースケース

自分で​試してみてください