先进的语音识别

语音识别API

行业领先性能的日语高精度语音转文字。使用针对日语音频优化的精度和速度将口语转换为文本。

API Docs 查看价格

语音识别演示

录制音频并即时获得转录文本

💡 提示：录音时长1-10秒。停止后，自动转录将开始。

行业领先的准确性

在真实日语音频上的基准测试性能

98.5%

整体准确性

清晰音频条件

95.2%

嘈杂环境

背景噪音处理

<0.5秒

响应时间

每分钟音频

97.8%

混合语言

日英代码切换

按领域划分的准确性比较

客户服务电话96.5%

商务会议97.2%

医疗咨询95.8%

法律程序98.1%

技术讨论96.9%

为日语音频而构建

专为日语语音识别设计的功能

多方言支持

准确识别标准日语、关西、东北和其他地区方言

实时流式传输

实时处理音频流，实现实时转录和即时结果

说话人分离

自动识别和分离对话中的多个说话人

极速处理

使用优化的推理管道在几分钟内处理数小时的音频

企业级安全

符合SOC 2标准，具有端到端加密和安全音频处理

自定义词汇

添加行业特定术语、品牌名称和自定义短语以提高准确性

受信任的使用场景

了解企业如何利用我们的ASR API

呼叫中心转录

自动转录客户服务电话，用于质量保证、合规和洞察。

质量监控
合规记录
坐席培训
客户情感分析

会议笔记

将会议、访谈和讨论转换为可搜索、可操作的文本文档。

商务会议
访谈记录
会议录音
团队站会

字幕和说明文字

为视频、直播和广播生成准确的字幕，支持实时或批处理模式。

视频字幕
现场活动说明文字
广播转录
无障碍合规

API Key

配置您的API密钥

在下方输入您的API密钥以自动更新此页面上的所有代码示例

热词和自定义词汇

通过在文本提示中包含热词来提高专业术语的转录准确性。热词帮助模型正确识别：

Language

Hotwords (comma-separated)

Request Body Preview

{
  "audio": "<base64-encoded audio>"
}

Get an API Key →

快速入门指南

通过三个简单步骤开始使用语音识别API

1. 获取API密钥

注册Shisa AI帐户并从开发者仪表板获取您的API密钥。在Authorization标头中包含它，并使用'shsk:'前缀：

Authorization: Bearer shsk:YOUR_API_KEY

2. 准备音频

API接受各种格式的base64编码音频。支持的音频格式包括：

OGG（Opus、Vorbis）
WAV（PCM、16位）
MP3、WebM、M4A、FLAC

3. 发送第一个请求

向API端点发送包含音频数据和配置的POST请求。这是使用cURL的基本示例：

curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \
  -H 'Authorization: Bearer shsk:YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "audio": "'$(base64 -w0 audio.ogg)'"
  }'

Minimal request

Only the audio field is required. Language is auto-detected and tuning parameters use sensible defaults.

Expected Response

The API returns a JSON response with the transcribed text, detected language, and confidence score.

{
  "text": "こんにちは、シサAIです。",
  "language": "ja",
  "confidence": 0.98
}

API端点

语音识别API使用聊天式界面以实现最大的灵活性和上下文感知

语音识别端点

POSThttps://api.shisa.ai/asr/srt/audio_llm

这个多模态端点接受文本指令和音频内容，允许您提供上下文和自定义词汇（热词）以提高准确性。

请求参数

使用这些参数配置您的转录请求

请求体参数

参数	类型	必需	描述
audio	string	Required	Base64-encoded audio data (WAV, OGG, MP3, or FLAC)
language	string	Optional	Language code (e.g. `"ja"`, `"en"`). Omit for automatic language detection (LID).
hotwords	string[]	Optional	Array of words/phrases to boost recognition accuracy for domain-specific terms
temperature	float	Optional	采样温度（0.0-2.0）。较低的值使输出更确定。默认: 0.0 Default: `0.0`
top_p	float	Optional	核采样参数（0.0-1.0）。控制输出的多样性。默认: 0.85 Default: `0.85`
frequency_penalty	float	Optional	对频繁令牌进行惩罚（-2.0至2.0）。减少重复。默认: 0.5 Default: `0.5`
repetition_penalty	float	Optional	对令牌重复进行惩罚（1.0-2.0）。大于1.0的值会抑制重复。默认: 1.05 Default: `1.05`
vad	integer	Optional	Voice activity detection mode Default: `1`

音频输入格式

音频必须以以下格式的base64编码数据URL提供：

"audio": "SGVsbG8gV29ybGQ..."

Pass raw base64-encoded audio data in the audio field. The server auto-detects the format from the binary header.

支持的音频格式：

Format	MIME Type	Detection
WAV	audio/wav	RIFF header
OGG	audio/ogg	OggS header
MP3	audio/mpeg	ID3 tag or MPEG sync bytes
FLAC	audio/flac	fLaC header

将音频编码为Base64

使用以下命令将音频文件转换为base64：

# Encode any supported format to base64
base64 -w0 audio.ogg    # Linux
base64 -i audio.ogg     # macOS

# Use in a curl request
curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \\
  -H 'Authorization: Bearer shsk:YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{ "audio": "'$(base64 -w0 audio.ogg)'" }'

Supported Languages (LID)

The API supports automatic language identification (LID) for the following languages. The detected language is returned in the language field of the response.

Primary Languages

jaJapanese

enEnglish

zhChinese

响应格式

理解API响应结构

成功响应

{
  "text": "こんにちは、シサAIです。",
  "language": "ja",
  "confidence": 0.98
}

响应字段：

text: The transcribed text from the audio
language: The detected or specified language code
confidence: Transcription confidence score (0 to 1)

错误处理

常见错误及其解决方法

错误响应格式

{
  "code": 400,
  "error": "No audio data provided"
}

401 Authentication Error

Returned when the API key is missing, invalid, or expired. Check that your Authorization header includes a valid token.

{
  "context": ["authMiddleware"],
  "code": 104,
  "name": "ErrAuthenticationFailed",
  "error": "Authentication error: Invalid token"
}

Error Codes

Code	Cause	Error Message
400	Missing audio field	No audio data provided
400	Audio decodes to empty	No audio data provided
400	Not base64 encoded	Invalid base64 audio data
400	Base64 decode fails	Invalid base64 audio data
400	Unsupported audio format	Unsupported audio format
500	Services not ready	Transcription service not available
500	Backend failure	Transcription failed: ...

代码示例

流行编程语言的集成示例

cURL - 快速入门

使用cURL转录音频文件的基本示例

curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \
  -H 'Authorization: Bearer shsk:YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "audio": "'$(base64 -w0 audio.ogg)'"
  }'

Python - 完整示例

包含base64编码和热词支持的完整Python函数

import base64
import requests

# Read and encode audio file
with open("audio.ogg", "rb") as f:
    audio_data = base64.b64encode(f.read()).decode("utf-8")

url = "https://api.shisa.ai/asr/srt/audio_llm"
headers = {
    "Authorization": "Bearer shsk:YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "audio": audio_data
}

response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
print(response.json())

JavaScript - 浏览器集成

使用FileReader API的客户端JavaScript示例

async function transcribeAudio(audioFile) {
  // Read file and convert to base64
  const fileBuffer = await audioFile.arrayBuffer();
  const base64Audio = btoa(
    new Uint8Array(fileBuffer).reduce(
      (data, byte) => data + String.fromCharCode(byte),
      ''
    )
  );

  const response = await fetch('https://api.shisa.ai/asr/srt/audio_llm', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer shsk:YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      audio: base64Audio
    })
  });

  if (!response.ok) {
    throw new Error(`API request failed: ${response.status}`);
  }

  return await response.json();
}

// Example usage with file input
document.querySelector('#audioInput').addEventListener('change', async (e) => {
  const file = e.target.files[0];
  if (file) {
    const result = await transcribeAudio(file);
    console.log('Transcription:', result);
  }
});

精准地将语音转换为文本

从每月180分钟（3小时）的免费转录开始。随着您的增长而扩展。

立即开始查看定价计划

语​音识别API

行业​领先​的​准确性

为​日语​音频​而​构建

受​信任​的​使用​场​景

API Key

快速入门​指南

API​端点

请​求​参数

支持​的​音频​格式：

将​音频​编码​为​Base64