行业领先的准确性
在真实日语音频上的基准测试性能
为日语音频而构建
专为日语语音识别设计的功能
受信任的使用场景
了解企业如何利用我们的ASR API
- 质量监控
- 合规记录
- 坐席培训
- 客户情感分析
- 商务会议
- 访谈记录
- 会议录音
- 团队站会
- 视频字幕
- 现场活动说明文字
- 广播转录
- 无障碍合规
API Key
{
"audio": "<base64-encoded audio>"
}快速入门指南
通过三个简单步骤开始使用语音识别API
注册Shisa AI帐户并从开发者仪表板获取您的API密钥。在Authorization标头中包含它,并使用'shsk:'前缀:
Authorization: Bearer shsk:YOUR_API_KEYAPI接受各种格式的base64编码音频。支持的音频格式包括:
- OGG(Opus、Vorbis)
- WAV(PCM、16位)
- MP3、WebM、M4A、FLAC
向API端点发送包含音频数据和配置的POST请求。这是使用cURL的基本示例:
curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \
-H 'Authorization: Bearer shsk:YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"audio": "'$(base64 -w0 audio.ogg)'"
}'Minimal request
Only the audio field is required. Language is auto-detected and tuning parameters use sensible defaults.
The API returns a JSON response with the transcribed text, detected language, and confidence score.
{
"text": "こんにちは、シサAIです。",
"language": "ja",
"confidence": 0.98
}API端点
语音识别API使用聊天式界面以实现最大的灵活性和上下文感知
https://api.shisa.ai/asr/srt/audio_llm这个多模态端点接受文本指令和音频内容,允许您提供上下文和自定义词汇(热词)以提高准确性。
请求参数
使用这些参数配置您的转录请求
| 参数 | 类型 | 必需 | 描述 |
|---|---|---|---|
| audio | string | Required | Base64-encoded audio data (WAV, OGG, MP3, or FLAC) |
| language | string | Optional | Language code (e.g. "ja", "en"). Omit for automatic language detection (LID). |
| hotwords | string[] | Optional | Array of words/phrases to boost recognition accuracy for domain-specific terms |
| temperature | float | Optional | 采样温度(0.0-2.0)。较低的值使输出更确定。默认: 0.0 Default: 0.0 |
| top_p | float | Optional | 核采样参数(0.0-1.0)。控制输出的多样性。默认: 0.85 Default: 0.85 |
| frequency_penalty | float | Optional | 对频繁令牌进行惩罚(-2.0至2.0)。减少重复。默认: 0.5 Default: 0.5 |
| repetition_penalty | float | Optional | 对令牌重复进行惩罚(1.0-2.0)。大于1.0的值会抑制重复。默认: 1.05 Default: 1.05 |
| vad | integer | Optional | Voice activity detection mode Default: 1 |
音频必须以以下格式的base64编码数据URL提供:
"audio": "SGVsbG8gV29ybGQ..."Pass raw base64-encoded audio data in the audio field. The server auto-detects the format from the binary header.
支持的音频格式:
| Format | MIME Type | Detection |
|---|---|---|
| WAV | audio/wav | RIFF header |
| OGG | audio/ogg | OggS header |
| MP3 | audio/mpeg | ID3 tag or MPEG sync bytes |
| FLAC | audio/flac | fLaC header |
将音频编码为Base64
使用以下命令将音频文件转换为base64:
# Encode any supported format to base64
base64 -w0 audio.ogg # Linux
base64 -i audio.ogg # macOS
# Use in a curl request
curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \\
-H 'Authorization: Bearer shsk:YOUR_API_KEY' \\
-H 'Content-Type: application/json' \\
-d '{ "audio": "'$(base64 -w0 audio.ogg)'" }'The API supports automatic language identification (LID) for the following languages. The detected language is returned in the language field of the response.
Primary Languages
jaJapaneseenEnglishzhChinese响应格式
理解API响应结构
{
"text": "こんにちは、シサAIです。",
"language": "ja",
"confidence": 0.98
}响应字段:
- text: The transcribed text from the audio
- language: The detected or specified language code
- confidence: Transcription confidence score (0 to 1)
错误处理
常见错误及其解决方法
{
"code": 400,
"error": "No audio data provided"
}Returned when the API key is missing, invalid, or expired. Check that your Authorization header includes a valid token.
{
"context": ["authMiddleware"],
"code": 104,
"name": "ErrAuthenticationFailed",
"error": "Authentication error: Invalid token"
}| Code | Cause | Error Message |
|---|---|---|
| 400 | Missing audio field | No audio data provided |
| 400 | Audio decodes to empty | No audio data provided |
| 400 | Not base64 encoded | Invalid base64 audio data |
| 400 | Base64 decode fails | Invalid base64 audio data |
| 400 | Unsupported audio format | Unsupported audio format |
| 500 | Services not ready | Transcription service not available |
| 500 | Backend failure | Transcription failed: ... |
代码示例
流行编程语言的集成示例
curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \
-H 'Authorization: Bearer shsk:YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"audio": "'$(base64 -w0 audio.ogg)'"
}'import base64
import requests
# Read and encode audio file
with open("audio.ogg", "rb") as f:
audio_data = base64.b64encode(f.read()).decode("utf-8")
url = "https://api.shisa.ai/asr/srt/audio_llm"
headers = {
"Authorization": "Bearer shsk:YOUR_API_KEY",
"Content-Type": "application/json"
}
payload = {
"audio": audio_data
}
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
print(response.json())async function transcribeAudio(audioFile) {
// Read file and convert to base64
const fileBuffer = await audioFile.arrayBuffer();
const base64Audio = btoa(
new Uint8Array(fileBuffer).reduce(
(data, byte) => data + String.fromCharCode(byte),
''
)
);
const response = await fetch('https://api.shisa.ai/asr/srt/audio_llm', {
method: 'POST',
headers: {
'Authorization': 'Bearer shsk:YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
audio: base64Audio
})
});
if (!response.ok) {
throw new Error(`API request failed: ${response.status}`);
}
return await response.json();
}
// Example usage with file input
document.querySelector('#audioInput').addEventListener('change', async (e) => {
const file = e.target.files[0];
if (file) {
const result = await transcribeAudio(file);
console.log('Transcription:', result);
}
});