Text-to-Speech API
Natural-sounding voice synthesis with emotional tones and expressive speech. Transform text into lifelike Japanese and English audio for any application.
Voices That Sound Human
State-of-the-art neural TTS with emotional intelligence
Try It Yourself
Experience our Text-to-Speech API with this interactive demo. Convert text to natural-sounding speech.
API Key
Quick Start
Generate speech in three simple steps.
Include your API key in the Authorization header of every request.
Authorization: Bearer YOUR_API_KEYSend a POST request with the text you want to convert to speech.
curl -s -X POST "https://api.shisa.ai/tts" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
"format": "mp3",
"stream": false,
"text": "こんにちは。Shisa Talkへようこそ。"
}' \
--output speech.mp3Minimal request
Only voice_id, text, and format are required. Set stream: true for real-time streaming.
The API returns binary audio data in your requested format. Save it to a file or stream it directly to an audio player.
# Play the generated audio
ffplay -nodisp -autoexit speech.mp3
# Or stream directly
curl -s -X POST "https://api.shisa.ai/tts" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9", "format": "mp3", "stream": true, "text": "ストリーミングテストです。"}' \
--output - | ffplay -nodisp -autoexit -API Endpoints
Two endpoints for generating speech and listing available voices.
https://api.shisa.ai/ttsConverts text to speech audio. Returns binary audio data in the requested format.
https://api.shisa.ai/tts/voicesReturns a JSON array of all available voices with their metadata, supported formats, and streaming capabilities.
Request Parameters
Parameters for the POST /tts endpoint.
| Parameter | Type | Required | Description |
|---|---|---|---|
| voice_id | string | Required | UUID of the voice to use. Get available IDs from GET /tts/voices. |
| text | string | Required | The text to convert to speech. Maximum 5000 characters. |
| format | string | Required | Output audio format. Must be supported by the selected voice. Options: mp3, wav, ogg, pcm, flac |
| stream | boolean | Optional | When true, returns audio as a chunked stream for real-time playback. Only available for voices with streaming: true. Default: false |
Response Format
Response formats for speech generation and voice listing.
On success, the API returns raw binary audio data with the appropriate Content-Type header (e.g. audio/mp3). Save the response body directly to a file.
# The response is binary audio data — save directly to file
curl -s -X POST "https://api.shisa.ai/tts" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9", "format": "mp3", "text": "テスト"}' \
--output speech.mp3Returns a JSON array of available voice objects.
[
{
"id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
"description": "Young male Japanese voice...",
"language": "Japanese & English",
"gender": "Male",
"formats": ["mp3", "ogg", "pcm"],
"streaming": true
}
]Voice Fields
- id: UUID to use as voice_id in requests
- description: Human-readable voice description
- language: Supported language(s)
- gender: Voice gender (Male, Female, Neutral)
- formats: Supported output audio formats
- streaming: Whether the voice supports real-time streaming
Error Handling
Error responses and how to handle them.
{
"context": ["..."],
"code": 104,
"name": "ErrAuthenticationFailed",
"error": "Authentication error: Invalid token"
}| Status | Cause | Resolution |
|---|---|---|
| 400 | Missing or invalid parameters | Check voice_id, text, and format fields |
| 400 | Unsupported format for voice | Use a format listed in the voice's formats array |
| 401 | Invalid or missing API key | Check your Authorization: Bearer header |
| 429 | Rate limit exceeded | Wait and retry with exponential backoff |
| 500 | Internal server error | Retry the request or contact support |
Simple Integration
Start generating speech in minutes with our easy-to-use API
# List available voices
curl -s -X GET "https://api.shisa.ai/tts/voices" \
-H "Authorization: Bearer YOUR_API_KEY" | jq .
# Generate speech
curl -s -X POST "https://api.shisa.ai/tts" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
"format": "mp3",
"stream": false,
"text": "こんにちは。Shisa Talkへようこそ。"
}' \
--output speech.mp3
# Stream audio directly to a player
curl -s -X POST "https://api.shisa.ai/tts" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
"format": "mp3",
"stream": true,
"text": "ストリーミングテストです。"
}' \
--output - | ffplay -nodisp -autoexit -import requests
API_URL = "https://api.shisa.ai"
API_KEY = "YOUR_API_KEY"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
# List available voices
def list_voices():
response = requests.get(f"{API_URL}/tts/voices", headers=HEADERS)
return response.json()
# Generate speech
def generate_speech(text, voice_id="e3362c0a-7677-4cd8-b122-91fb093305c9", format="mp3", stream=False):
response = requests.post(
f"{API_URL}/tts",
headers=HEADERS,
json={
"voice_id": voice_id,
"format": format,
"stream": stream,
"text": text
},
stream=stream
)
output_file = f"output.{format}"
with open(output_file, "wb") as f:
if stream:
for chunk in response.iter_content():
f.write(chunk)
else:
f.write(response.content)
return output_file
# Example usage
voices = list_voices()
print(voices)
audio_file = generate_speech(
"お客様の声を大切にしています。",
voice_id="e3362c0a-7677-4cd8-b122-91fb093305c9"
)const API_URL = 'https://api.shisa.ai';
const API_KEY = 'YOUR_API_KEY';
const headers = {
'Authorization': `Bearer ${API_KEY}`,
'Content-Type': 'application/json',
};
// List available voices
const listVoices = async () => {
const response = await fetch(`${API_URL}/tts/voices`, { headers });
return response.json();
};
// Generate speech
const generateSpeech = async (text, voiceId = 'e3362c0a-7677-4cd8-b122-91fb093305c9', format = 'mp3') => {
const response = await fetch(`${API_URL}/tts`, {
method: 'POST',
headers,
body: JSON.stringify({
voice_id: voiceId,
format,
stream: true,
text,
}),
});
// Handle streaming response
const reader = response.body.getReader();
const chunks = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
// Combine chunks and create audio blob
const blob = new Blob(chunks, { type: `audio/${format}` });
const url = URL.createObjectURL(blob);
// Play audio
const audio = new Audio(url);
audio.play();
};
// Example usage
const voices = await listVoices();
console.log(voices);
await generateSpeech('ようこそ、Shisa Talkへ。', 'e3362c0a-7677-4cd8-b122-91fb093305c9');Trusted Use Cases
See how businesses leverage our TTS API
- AI phone agents
- Smart home assistants
- Interactive voice response
- Voice-enabled apps
- Audiobook narration
- E-learning courses
- Podcast production
- Video voiceovers
- Screen readers
- News article audio
- Document narration
- Navigation assistance
Give Your Applications a Voice
Start with 20,000 free characters per month. Upgrade anytime for more capacity and features.