Neural Voice Synthesis

Text-to-Speech API

Natural-sounding voice synthesis with emotional tones and expressive speech. Transform text into lifelike Japanese and English audio for any application.

Try Demo View Pricing

Voices That Sound Human

State-of-the-art neural TTS with emotional intelligence

Natural Voices

Highly realistic Japanese and English voices with natural intonation and prosody

Emotional Tones

Express joy, concern, excitement, or professionalism with emotional voice modulation

Custom Voice Cloning

Clone your brand voice or create custom voices for consistent audio branding

Real-time Streaming

Low-latency streaming for interactive applications and real-time conversations

Multiple Languages

Support for Japanese, English, and more with native pronunciation quality

Enterprise Grade

99.9% uptime SLA, secure processing, and dedicated support for enterprise needs

Try It Yourself

Experience our Text-to-Speech API with this interactive demo. Convert text to natural-sounding speech.

🎵 Text-to-Speech Demo

Convert text to natural-sounding speech

Voice

Text to Speak

Max 200 characters45/200

💡 Tip: Press Ctrl+Enter to generate speech quickly. Use up to 200 characters for best results.

API Key

Configure Your API Key

Enter your API key to populate code examples below with your credentials.

Get an API Key →

Quick Start

Generate speech in three simple steps.

Step 1: Authentication

Include your API key in the Authorization header of every request.

Authorization: Bearer YOUR_API_KEY

Step 2: Make Your First Request

Send a POST request with the text you want to convert to speech.

curl -s -X POST "https://api.shisa.ai/tts" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
    "format": "mp3",
    "stream": false,
    "text": "こんにちは。Shisa Talkへようこそ。"
  }' \
  --output speech.mp3

Minimal request

Only voice_id, text, and format are required. Set stream: true for real-time streaming.

Step 3: Play the Audio

The API returns binary audio data in your requested format. Save it to a file or stream it directly to an audio player.

# Play the generated audio
ffplay -nodisp -autoexit speech.mp3

# Or stream directly
curl -s -X POST "https://api.shisa.ai/tts" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9", "format": "mp3", "stream": true, "text": "ストリーミングテストです。"}' \
  --output - | ffplay -nodisp -autoexit -

API Endpoints

Two endpoints for generating speech and listing available voices.

Generate Speech

POSThttps://api.shisa.ai/tts

Converts text to speech audio. Returns binary audio data in the requested format.

List Voices

GEThttps://api.shisa.ai/tts/voices

Returns a JSON array of all available voices with their metadata, supported formats, and streaming capabilities.

Request Parameters

Parameters for the POST /tts endpoint.

POST /tts Parameters

Parameter	Type	Required	Description
voice_id	string	Required	UUID of the voice to use. Get available IDs from GET /tts/voices.
text	string	Required	The text to convert to speech. Maximum 5000 characters.
format	string	Required	Output audio format. Must be supported by the selected voice. Options: `mp3`, `wav`, `ogg`, `pcm`, `flac`
stream	boolean	Optional	When true, returns audio as a chunked stream for real-time playback. Only available for voices with streaming: true. Default: `false`

Response Format

Response formats for speech generation and voice listing.

POST /tts — Binary Audio

On success, the API returns raw binary audio data with the appropriate Content-Type header (e.g. audio/mp3). Save the response body directly to a file.

# The response is binary audio data — save directly to file
curl -s -X POST "https://api.shisa.ai/tts" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9", "format": "mp3", "text": "テスト"}' \
  --output speech.mp3

GET /tts/voices — JSON

Returns a JSON array of available voice objects.

[
  {
    "id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
    "description": "Young male Japanese voice...",
    "language": "Japanese & English",
    "gender": "Male",
    "formats": ["mp3", "ogg", "pcm"],
    "streaming": true
  }
]

Voice Fields

id: UUID to use as voice_id in requests
description: Human-readable voice description
language: Supported language(s)
gender: Voice gender (Male, Female, Neutral)
formats: Supported output audio formats
streaming: Whether the voice supports real-time streaming

Error Handling

Error responses and how to handle them.

Error Response Format

{
  "context": ["..."],
  "code": 104,
  "name": "ErrAuthenticationFailed",
  "error": "Authentication error: Invalid token"
}

Error Codes

Status	Cause	Resolution
400	Missing or invalid parameters	Check voice_id, text, and format fields
400	Unsupported format for voice	Use a format listed in the voice's formats array
401	Invalid or missing API key	Check your Authorization: Bearer header
429	Rate limit exceeded	Wait and retry with exponential backoff
500	Internal server error	Retry the request or contact support

Simple Integration

Start generating speech in minutes with our easy-to-use API

Quick Start with cURL

# List available voices
curl -s -X GET "https://api.shisa.ai/tts/voices" \
  -H "Authorization: Bearer YOUR_API_KEY" | jq .

# Generate speech
curl -s -X POST "https://api.shisa.ai/tts" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
    "format": "mp3",
    "stream": false,
    "text": "こんにちは。Shisa Talkへようこそ。"
  }' \
  --output speech.mp3

# Stream audio directly to a player
curl -s -X POST "https://api.shisa.ai/tts" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "voice_id": "e3362c0a-7677-4cd8-b122-91fb093305c9",
    "format": "mp3",
    "stream": true,
    "text": "ストリーミングテストです。"
  }' \
  --output - | ffplay -nodisp -autoexit -

Python Integration

import requests

API_URL = "https://api.shisa.ai"
API_KEY = "YOUR_API_KEY"
HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# List available voices
def list_voices():
    response = requests.get(f"{API_URL}/tts/voices", headers=HEADERS)
    return response.json()

# Generate speech
def generate_speech(text, voice_id="e3362c0a-7677-4cd8-b122-91fb093305c9", format="mp3", stream=False):
    response = requests.post(
        f"{API_URL}/tts",
        headers=HEADERS,
        json={
            "voice_id": voice_id,
            "format": format,
            "stream": stream,
            "text": text
        },
        stream=stream
    )

    output_file = f"output.{format}"
    with open(output_file, "wb") as f:
        if stream:
            for chunk in response.iter_content():
                f.write(chunk)
        else:
            f.write(response.content)

    return output_file

# Example usage
voices = list_voices()
print(voices)

audio_file = generate_speech(
    "お客様の声を大切にしています。",
    voice_id="e3362c0a-7677-4cd8-b122-91fb093305c9"
)

JavaScript/TypeScript with Streaming

const API_URL = 'https://api.shisa.ai';
const API_KEY = 'YOUR_API_KEY';
const headers = {
  'Authorization': `Bearer ${API_KEY}`,
  'Content-Type': 'application/json',
};

// List available voices
const listVoices = async () => {
  const response = await fetch(`${API_URL}/tts/voices`, { headers });
  return response.json();
};

// Generate speech
const generateSpeech = async (text, voiceId = 'e3362c0a-7677-4cd8-b122-91fb093305c9', format = 'mp3') => {
  const response = await fetch(`${API_URL}/tts`, {
    method: 'POST',
    headers,
    body: JSON.stringify({
      voice_id: voiceId,
      format,
      stream: true,
      text,
    }),
  });

  // Handle streaming response
  const reader = response.body.getReader();
  const chunks = [];

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    chunks.push(value);
  }

  // Combine chunks and create audio blob
  const blob = new Blob(chunks, { type: `audio/${format}` });
  const url = URL.createObjectURL(blob);

  // Play audio
  const audio = new Audio(url);
  audio.play();
};

// Example usage
const voices = await listVoices();
console.log(voices);

await generateSpeech('ようこそ、Shisa Talkへ。', 'e3362c0a-7677-4cd8-b122-91fb093305c9');

Trusted Use Cases

See how businesses leverage our TTS API

Virtual Assistants

Power AI assistants, chatbots, and voice interfaces with natural-sounding speech for engaging user interactions.

AI phone agents
Smart home assistants
Interactive voice response
Voice-enabled apps

Audiobooks & Content

Create audiobooks, podcasts, and educational content with professional-quality narration at scale.

Audiobook narration
E-learning courses
Podcast production
Video voiceovers

Accessibility

Make content accessible to visually impaired users and provide audio alternatives for all users.

Screen readers
News article audio
Document narration
Navigation assistance

Give Your Applications a Voice

Start with 20,000 free characters per month. Upgrade anytime for more capacity and features.

Try Demo Now View Pricing Plans