Neural Voice Synthesis

Text-to-Speech API

Natural-sounding voice synthesis with emotional tones and expressive speech. Transform text into lifelike Japanese and English audio for any application.

Voices That Sound Human

State-of-the-art neural TTS with emotional intelligence

Natural Voices
Highly realistic Japanese and English voices with natural intonation and prosody
Emotional Tones
Express joy, concern, excitement, or professionalism with emotional voice modulation
Custom Voice Cloning
Clone your brand voice or create custom voices for consistent audio branding
Real-time Streaming
Low-latency streaming for interactive applications and real-time conversations
Multiple Languages
Support for Japanese, English, and more with native pronunciation quality
Enterprise Grade
99.9% uptime SLA, secure processing, and dedicated support for enterprise needs

Try It Yourself

Experience our Text-to-Speech API with this interactive demo. Convert text to natural-sounding speech.

🎵 Text-to-Speech Demo
Convert text to natural-sounding speech
Max 200 characters45/200
💡 Tip: Press Ctrl+Enter to generate speech quickly. Use up to 200 characters for best results.

API Key

Configure Your API Key
Enter your API key to populate code examples below with your credentials.

Quick Start

Generate speech in three simple steps.

Step 1: Authentication

Include your API key in the Authorization header of every request.

Authorization: Bearer YOUR_API_KEY
Step 2: Make Your First Request

Send a POST request with the text you want to convert to speech.

curl -s -X POST "https://api.shisa.ai/tts" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
    "format": "mp3",
    "stream": false,
    "text": "こんにちは。Shisa Talkへようこそ。"
  }' \
  --output speech.mp3

Minimal request

Only voice_id, text, and format are required. Set stream: true for real-time streaming.

Step 3: Play the Audio

The API returns binary audio data in your requested format. Save it to a file or stream it directly to an audio player.

# Play the generated audio
ffplay -nodisp -autoexit speech.mp3

# Or stream directly
curl -s -X POST "https://api.shisa.ai/tts" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9", "format": "mp3", "stream": true, "text": "ストリーミングテストです。"}' \
  --output - | ffplay -nodisp -autoexit -

API Endpoints

Two endpoints for generating speech and listing available voices.

Generate Speech
POSThttps://api.shisa.ai/tts

Converts text to speech audio. Returns binary audio data in the requested format.

List Voices
GEThttps://api.shisa.ai/tts/voices

Returns a JSON array of all available voices with their metadata, supported formats, and streaming capabilities.

Request Parameters

Parameters for the POST /tts endpoint.

POST /tts Parameters
ParameterTypeRequiredDescription
voice_idstringRequiredUUID of the voice to use. Get available IDs from GET /tts/voices.
textstringRequiredThe text to convert to speech. Maximum 5000 characters.
formatstringRequiredOutput audio format. Must be supported by the selected voice.
Options: mp3, wav, ogg, pcm, flac
streambooleanOptionalWhen true, returns audio as a chunked stream for real-time playback. Only available for voices with streaming: true.
Default: false

Response Format

Response formats for speech generation and voice listing.

POST /tts — Binary Audio

On success, the API returns raw binary audio data with the appropriate Content-Type header (e.g. audio/mp3). Save the response body directly to a file.

# The response is binary audio data — save directly to file
curl -s -X POST "https://api.shisa.ai/tts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9", "format": "mp3", "text": "テスト"}' \
  --output speech.mp3
GET /tts/voices — JSON

Returns a JSON array of available voice objects.

[
  {
    "id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
    "description": "Young male Japanese voice...",
    "language": "Japanese & English",
    "gender": "Male",
    "formats": ["mp3", "ogg", "pcm"],
    "streaming": true
  }
]

Voice Fields

  • id: UUID to use as voice_id in requests
  • description: Human-readable voice description
  • language: Supported language(s)
  • gender: Voice gender (Male, Female, Neutral)
  • formats: Supported output audio formats
  • streaming: Whether the voice supports real-time streaming

Error Handling

Error responses and how to handle them.

Error Response Format
{
  "context": ["..."],
  "code": 104,
  "name": "ErrAuthenticationFailed",
  "error": "Authentication error: Invalid token"
}
Error Codes
StatusCauseResolution
400Missing or invalid parametersCheck voice_id, text, and format fields
400Unsupported format for voiceUse a format listed in the voice's formats array
401Invalid or missing API keyCheck your Authorization: Bearer header
429Rate limit exceededWait and retry with exponential backoff
500Internal server errorRetry the request or contact support

Simple Integration

Start generating speech in minutes with our easy-to-use API

Quick Start with cURL
# List available voices
curl -s -X GET "https://api.shisa.ai/tts/voices" \
  -H "Authorization: Bearer YOUR_API_KEY" | jq .

# Generate speech
curl -s -X POST "https://api.shisa.ai/tts" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
    "format": "mp3",
    "stream": false,
    "text": "こんにちは。Shisa Talkへようこそ。"
  }' \
  --output speech.mp3

# Stream audio directly to a player
curl -s -X POST "https://api.shisa.ai/tts" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
    "format": "mp3",
    "stream": true,
    "text": "ストリーミングテストです。"
  }' \
  --output - | ffplay -nodisp -autoexit -
Python Integration
import requests

API_URL = "https://api.shisa.ai"
API_KEY = "YOUR_API_KEY"
HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# List available voices
def list_voices():
    response = requests.get(f"{API_URL}/tts/voices", headers=HEADERS)
    return response.json()

# Generate speech
def generate_speech(text, voice_id="e3362c0a-7677-4cd8-b122-91fb093305c9", format="mp3", stream=False):
    response = requests.post(
        f"{API_URL}/tts",
        headers=HEADERS,
        json={
            "voice_id": voice_id,
            "format": format,
            "stream": stream,
            "text": text
        },
        stream=stream
    )

    output_file = f"output.{format}"
    with open(output_file, "wb") as f:
        if stream:
            for chunk in response.iter_content():
                f.write(chunk)
        else:
            f.write(response.content)

    return output_file

# Example usage
voices = list_voices()
print(voices)

audio_file = generate_speech(
    "お客様の声を大切にしています。",
    voice_id="e3362c0a-7677-4cd8-b122-91fb093305c9"
)
JavaScript/TypeScript with Streaming
const API_URL = 'https://api.shisa.ai';
const API_KEY = 'YOUR_API_KEY';
const headers = {
  'Authorization': `Bearer ${API_KEY}`,
  'Content-Type': 'application/json',
};

// List available voices
const listVoices = async () => {
  const response = await fetch(`${API_URL}/tts/voices`, { headers });
  return response.json();
};

// Generate speech
const generateSpeech = async (text, voiceId = 'e3362c0a-7677-4cd8-b122-91fb093305c9', format = 'mp3') => {
  const response = await fetch(`${API_URL}/tts`, {
    method: 'POST',
    headers,
    body: JSON.stringify({
      voice_id: voiceId,
      format,
      stream: true,
      text,
    }),
  });

  // Handle streaming response
  const reader = response.body.getReader();
  const chunks = [];

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    chunks.push(value);
  }

  // Combine chunks and create audio blob
  const blob = new Blob(chunks, { type: `audio/${format}` });
  const url = URL.createObjectURL(blob);

  // Play audio
  const audio = new Audio(url);
  audio.play();
};

// Example usage
const voices = await listVoices();
console.log(voices);

await generateSpeech('ようこそ、Shisa Talkへ。', 'e3362c0a-7677-4cd8-b122-91fb093305c9');

Trusted Use Cases

See how businesses leverage our TTS API

Virtual Assistants
Power AI assistants, chatbots, and voice interfaces with natural-sounding speech for engaging user interactions.
  • AI phone agents
  • Smart home assistants
  • Interactive voice response
  • Voice-enabled apps
Audiobooks & Content
Create audiobooks, podcasts, and educational content with professional-quality narration at scale.
  • Audiobook narration
  • E-learning courses
  • Podcast production
  • Video voiceovers
Accessibility
Make content accessible to visually impaired users and provide audio alternatives for all users.
  • Screen readers
  • News article audio
  • Document narration
  • Navigation assistance

Give Your Applications a Voice

Start with 20,000 free characters per month. Upgrade anytime for more capacity and features.