2b-interfaz web en tiempo real

✅ Flask (servidor web Python)
✅ Socket.IO (para comunicación en vivo)
✅ Vosk (transcripción offline)
✅ HTML + JavaScript para mostrar el texto transcrito en el navegador

🎯 ¿Qué hará esta versión web?

Mostrará una página con un área de texto y botón de "Iniciar dictado".
Transcribirá lo que hables en el micrófono del navegador.
Lo enviará al servidor, que usará Vosk para transcribir.
Mostrará en tiempo real la transcripción en la página.

📁 Estructura del proyecto

cpp
gpt-vosk_web/
│
├── app.py                ← Servidor Flask + Vosk
├── static/
│   └── script.js         ← Captura audio con JavaScript
├── templates/
│   └── index.html        ← Interfaz visual

1️⃣ `app.py` – Servidor con Vosk y Flask-SocketIO

python
import os
import queue
import tempfile
import json
import subprocess
from flask import Flask, render_template
from flask_socketio import SocketIO
from vosk import Model, KaldiRecognizer

# Configuración inicial
app = Flask(__name__)
socketio = SocketIO(app, cors_allowed_origins="*")
model = Model("models/vosk-model-small-es-0.42")  # Asegúrate de tenerlo descargado

@app.route('/')
def index():
    return render_template('index.html')

@socketio.on('audio_chunk')
def handle_audio(data):
    # Guardar temporalmente el audio recibido (.webm)
    with tempfile.NamedTemporaryFile(suffix=".webm", delete=False) as input_file:
        input_file.write(data)
        input_path = input_file.name

    # Convertir a .wav
    output_path = input_path + ".wav"
    ffmpeg_cmd = [
        "ffmpeg", "-y", "-i", input_path,
        "-ar", "16000", "-ac", "1", output_path
    ]

    try:
        subprocess.run(ffmpeg_cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, check=True)

        # Transcribir con Vosk
        rec = KaldiRecognizer(model, 16000)
        with open(output_path, "rb") as f:
            while True:
                buf = f.read(4000)
                if not buf:
                    break
                if rec.AcceptWaveform(buf):
                    result = json.loads(rec.Result())
                    socketio.emit('transcription', result.get("text", ""))
            final = json.loads(rec.FinalResult())
            socketio.emit('transcription', final.get("text", ""))

    except Exception as e:
        print("Error al procesar:", e)

    finally:
        os.remove(input_path)
        if os.path.exists(output_path):
            os.remove(output_path)

if __name__ == '__main__':
    socketio.run(app, host='0.0.0.0', port=5000)

2️⃣ `templates/index.html`

html
<!DOCTYPE html>
<html lang="es">
<head>
  <meta charset="UTF-8">
  <title>🧠 Transcriptor con Vosk</title>
  <style>
    body { font-family: sans-serif; padding: 2rem; background: #f7f7f7; }
    textarea { width: 100%; height: 200px; font-size: 1.2rem; padding: 1rem; }
    button { margin-top: 1rem; padding: 1rem; font-size: 1rem; }
  </style>
</head>
<body>
  <h1>🎙️ Transcriptor en tiempo real (Vosk)</h1>
  <textarea id="texto" placeholder="Aquí aparecerá la transcripción..." readonly></textarea><br>
  <button id="btn">Iniciar Dictado</button>
  <div id="estado">Estado: Inactivo</div>

  <script src="https://cdn.socket.io/4.5.4/socket.io.min.js"></script>
  <script src="/static/script.js"></script>
</body>
</html>

3️⃣ `static/script.js` – Captura y envío de audio

javascript
const btn = document.getElementById("btn");
const texto = document.getElementById("texto");
const estado = document.getElementById("estado");
const socket = io();

let mediaRecorder;
let escuchando = false;

btn.onclick = async () => {
  if (!escuchando) {
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    mediaRecorder = new MediaRecorder(stream, { mimeType: "audio/webm" });

    mediaRecorder.ondataavailable = async e => {
      if (e.data.size > 0) {
        const buffer = await e.data.arrayBuffer();
        socket.emit("audio_chunk", new Uint8Array(buffer));
      }
    };

    mediaRecorder.onstart = () => {
      estado.textContent = "Estado: Escuchando...";
      btn.textContent = "Detener";
    };

    mediaRecorder.onstop = () => {
      estado.textContent = "Estado: Inactivo";
      btn.textContent = "Iniciar Dictado";
    };

    mediaRecorder.start(1500); // Cada 1.5 segundos
    escuchando = true;
  } else {
    mediaRecorder.stop();
    escuchando = false;
  }
};

socket.on("transcription", (data) => {
  if (data) texto.value += data + " ";
});

✅ Requisitos de instalación

bash
pip install vosk flask flask-socketio soundfile

Y asegúrate de tener:

ffmpeg instalado y disponible en tu PATH.
👉 Si no lo tienes: https://ffmpeg.org/download.html

▶️ Ejecutar

Desde el directorio raíz:

bash
python app.py

Luego abre tu navegador en:

arduino
http://localhost:5000

¿Qué sigue?

¿Quieres que:

Se guarde la transcripción en un .txt en tiempo real?
Añadir botón para copiar texto?
Mostrar el tiempo o frases por párrafo?

Estoy listo para ayudarte a mejorar la interfaz tanto como desees.

¿Qué más puedo agregar?

💾 Guardar en archivo automáticamente
📋 Botón "Copiar"
⏱️ Cronómetro o estado de tiempo grabado
🌐 Hacerlo accesible desde red local para celular/tablet

Buscar este blog

sotelo-Web Speech API

2b-interfaz web en tiempo real

🎯 ¿Qué hará esta versión web?

📁 Estructura del proyecto

1️⃣ `app.py` – Servidor con Vosk y Flask-SocketIO

2️⃣ `templates/index.html`

3️⃣ `static/script.js` – Captura y envío de audio

✅ Requisitos de instalación

▶️ Ejecutar

¿Qué sigue?

¿Qué más puedo agregar?

Comentarios

Publicar un comentario

Entradas más populares de este blog

b-Web Speech API

captura video con audio del sistema (como música o sonidos del navegador) pero sin usar el micrófono

EL audio lo envia el navegador-Transcripción de Voz con Whisper

2b-interfaz web en tiempo real

🎯 ¿Qué hará esta versión web?

📁 Estructura del proyecto

1️⃣ app.py – Servidor con Vosk y Flask-SocketIO

2️⃣ templates/index.html

3️⃣ static/script.js – Captura y envío de audio

✅ Requisitos de instalación

▶️ Ejecutar

¿Qué sigue?

¿Qué más puedo agregar?

Comentarios

Publicar un comentario

Entradas más populares de este blog

b-Web Speech API

captura video con audio del sistema (como música o sonidos del navegador) pero sin usar el micrófono

EL audio lo envia el navegador-Transcripción de Voz con Whisper

1️⃣ `app.py` – Servidor con Vosk y Flask-SocketIO

2️⃣ `templates/index.html`

3️⃣ `static/script.js` – Captura y envío de audio