Speech Recognition API
Highly accurate speech-to-text for Japanese with industry-leading performance. Convert spoken language into text with precision and speed optimized for Japanese audio.
Industry-Leading Accuracy
Benchmarked performance on real-world Japanese audio
Built for Japanese Audio
Features designed specifically for Japanese speech recognition
Trusted Use Cases
See how businesses leverage our ASR API
- Quality monitoring
- Compliance recording
- Agent training
- Customer sentiment analysis
- Business meetings
- Interview transcripts
- Conference recordings
- Team standups
- Video subtitles
- Live event captions
- Broadcast transcription
- Accessibility compliance
API Key
{
"audio": "<base64-encoded audio>"
}Quick Start Guide
Get up and running with the Speech Recognition API in three simple steps
Sign up for a Shisa AI account and obtain your API key from the developer dashboard. Include it in the Authorization header with the 'shsk:' prefix:
Authorization: Bearer shsk:YOUR_API_KEYThe API accepts base64-encoded audio in various formats. Supported audio formats include:
- OGG (Opus, Vorbis)
- WAV (PCM, 16-bit)
- MP3, WebM, M4A, FLAC
Send a POST request to the API endpoint with your audio data and configuration. Here's a basic example using cURL:
curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \
-H 'Authorization: Bearer shsk:YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"audio": "'$(base64 -w0 audio.ogg)'"
}'Minimal request
Only the audio field is required. Language is auto-detected and tuning parameters use sensible defaults.
The API returns a JSON response with the transcribed text, detected language, and confidence score.
{
"text": "こんにちは、シサAIです。",
"language": "ja",
"confidence": 0.98
}API Endpoint
The Speech Recognition API uses a chat-style interface for maximum flexibility and context awareness
https://api.shisa.ai/asr/srt/audio_llmThis multimodal endpoint accepts both text instructions and audio content, allowing you to provide context and custom vocabulary (hotwords) for improved accuracy.
Request Parameters
Configure your transcription requests with these parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| audio | string | Required | Base64-encoded audio data (WAV, OGG, MP3, or FLAC) |
| language | string | Optional | Language code (e.g. "ja", "en"). Omit for automatic language detection (LID). |
| hotwords | string[] | Optional | Array of words/phrases to boost recognition accuracy for domain-specific terms |
| temperature | float | Optional | Sampling temperature (0.0-2.0). Lower values make output more deterministic. Default: 0.0 Default: 0.0 |
| top_p | float | Optional | Nucleus sampling parameter (0.0-1.0). Controls diversity of output. Default: 0.85 Default: 0.85 |
| frequency_penalty | float | Optional | Penalizes frequent tokens (-2.0 to 2.0). Reduces repetition. Default: 0.5 Default: 0.5 |
| repetition_penalty | float | Optional | Penalizes token repetition (1.0-2.0). Values > 1.0 discourage repetition. Default: 1.05 Default: 1.05 |
| vad | integer | Optional | Voice activity detection mode Default: 1 |
Audio must be provided as a base64-encoded data URL in the following format:
"audio": "SGVsbG8gV29ybGQ..."Pass raw base64-encoded audio data in the audio field. The server auto-detects the format from the binary header.
Supported Audio Formats:
| Format | MIME Type | Detection |
|---|---|---|
| WAV | audio/wav | RIFF header |
| OGG | audio/ogg | OggS header |
| MP3 | audio/mpeg | ID3 tag or MPEG sync bytes |
| FLAC | audio/flac | fLaC header |
Encoding Audio to Base64
Use the following command to convert your audio file to base64:
# Encode any supported format to base64
base64 -w0 audio.ogg # Linux
base64 -i audio.ogg # macOS
# Use in a curl request
curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \\
-H 'Authorization: Bearer shsk:YOUR_API_KEY' \\
-H 'Content-Type: application/json' \\
-d '{ "audio": "'$(base64 -w0 audio.ogg)'" }'The API supports automatic language identification (LID) for the following languages. The detected language is returned in the language field of the response.
Primary Languages
jaJapaneseenEnglishzhChineseResponse Format
Understanding the API response structure
{
"text": "こんにちは、シサAIです。",
"language": "ja",
"confidence": 0.98
}Response Fields:
- text: The transcribed text from the audio
- language: The detected or specified language code
- confidence: Transcription confidence score (0 to 1)
Error Handling
Common errors and how to resolve them
{
"code": 400,
"error": "No audio data provided"
}Returned when the API key is missing, invalid, or expired. Check that your Authorization header includes a valid token.
{
"context": ["authMiddleware"],
"code": 104,
"name": "ErrAuthenticationFailed",
"error": "Authentication error: Invalid token"
}| Code | Cause | Error Message |
|---|---|---|
| 400 | Missing audio field | No audio data provided |
| 400 | Audio decodes to empty | No audio data provided |
| 400 | Not base64 encoded | Invalid base64 audio data |
| 400 | Base64 decode fails | Invalid base64 audio data |
| 400 | Unsupported audio format | Unsupported audio format |
| 500 | Services not ready | Transcription service not available |
| 500 | Backend failure | Transcription failed: ... |
Code Examples
Integration examples in popular programming languages
curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \
-H 'Authorization: Bearer shsk:YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
"audio": "'$(base64 -w0 audio.ogg)'"
}'import base64
import requests
# Read and encode audio file
with open("audio.ogg", "rb") as f:
audio_data = base64.b64encode(f.read()).decode("utf-8")
url = "https://api.shisa.ai/asr/srt/audio_llm"
headers = {
"Authorization": "Bearer shsk:YOUR_API_KEY",
"Content-Type": "application/json"
}
payload = {
"audio": audio_data
}
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
print(response.json())async function transcribeAudio(audioFile) {
// Read file and convert to base64
const fileBuffer = await audioFile.arrayBuffer();
const base64Audio = btoa(
new Uint8Array(fileBuffer).reduce(
(data, byte) => data + String.fromCharCode(byte),
''
)
);
const response = await fetch('https://api.shisa.ai/asr/srt/audio_llm', {
method: 'POST',
headers: {
'Authorization': 'Bearer shsk:YOUR_API_KEY',
'Content-Type': 'application/json'
},
body: JSON.stringify({
audio: base64Audio
})
});
if (!response.ok) {
throw new Error(`API request failed: ${response.status}`);
}
return await response.json();
}
// Example usage with file input
document.querySelector('#audioInput').addEventListener('change', async (e) => {
const file = e.target.files[0];
if (file) {
const result = await transcribeAudio(file);
console.log('Transcription:', result);
}
});Turn Speech into Text with Precision
Start with 180 minutes (3 hours) of free transcription per month. Scale as you grow.