mirror of https://github.com/zachlatta/freeflow.git synced 2026-04-19 22:14:35 +10:00

Go to file

marcbodea f8c43adf3a Merge pull request #91 from iris-sfg/feat/setup-model-selection

Make transcription, cleanup, and context models configurable

2026-04-17 21:25:48 +02:00

.github/workflows

Remove macOS build workflow from GitHub Actions

2026-04-15 15:44:27 +02:00

Resources

Fix macOS app icon shape and DMG icon preview

2026-02-24 18:35:07 -08:00

Sources

Add draft-backed model settings fields

2026-04-17 21:10:33 +02:00

.gitignore

Add Developer ID code signing, notarization, and stapling

2026-02-16 14:05:05 -05:00

FreeFlow.entitlements

Add run log with audio playback, pipeline visualization, and FreeFlow rename

2026-02-15 16:16:37 -05:00

Info.plist

Bump version to 0.2.0

2026-03-22 01:51:49 -04:00

LICENSE

Add auto-updates, microphone selection, and setup flow improvements

2026-02-16 10:44:30 -05:00

Makefile

Refactor Makefile to improve build process and path handling

2026-03-14 13:49:07 +01:00

README.md

Add @marcbodea as maintainer

2026-04-13 12:40:37 -04:00

README.md

FreeFlow

Free and open source alternative to Wispr Flow, Superwhisper, and Monologue.

⬇ Download FreeFlow.dmg
_{Works on all Macs (Apple Silicon + Intel)}

Thank you to @marcbodea for maintaining FreeFlow!

I like the concept of apps like Wispr Flow, Superwhisper, and Monologue that use AI to add accurate and easy-to-use transcription to your computer, but they all charge fees of ~$10/month when the underlying AI models are free to use or cost pennies.

So over the weekend I vibe-coded my own free version!

It's called FreeFlow. Here's how it works:

Download the app from above or click here
Get a free Groq API key from groq.com
Hold Fn to talk, or tap Command-Fn to start and stop dictation, and have whatever you say pasted into the current text field

You can also customize both shortcuts. If your toggle shortcut extends your hold shortcut, you can start in hold mode and press the extra modifier keys to latch into tap mode without stopping the recording.

One of the cool features is that it's context aware. If you're replying to an email, it'll read the names of the people you're replying to and make sure to spell their names correctly. Same with if you're dictating into a terminal or another app. This is the same thing as Monologue's "Deep Context" feature.

An added bonus is that there's no FreeFlow server, so no data is stored or retained - making it more privacy friendly than the SaaS apps. The only information that leaves your computer are the API calls to Groq's transcription and LLM API (LLM is for post-processing the transcription to adapt to context).

If you'd rather keep cleanup more literal and less context-aware, you can paste this simpler prompt into the custom system prompt setting:

Simple post-processing prompt

You are a dictation post-processor. You receive raw speech-to-text output and return clean text ready to be typed into an application.

Your job:
- Remove filler words (um, uh, you know, like) unless they carry meaning.
- Fix spelling, grammar, and punctuation errors.
- When the transcript already contains a word that is a close misspelling of a name or term from the context or custom vocabulary, correct the spelling. Never insert names or terms from context that the speaker did not say.
- Preserve the speaker's intent, tone, and meaning exactly.

Output rules:
- Return ONLY the cleaned transcript text, nothing else. So NEVER output words like "Here is the cleaned transcript text:"
- If the transcription is empty, return exactly: EMPTY
- Do not add words, names, or content that are not in the transcription. The context is only for correcting spelling of words already spoken.
- Do not change the meaning of what was said.

Example:
RAW_TRANSCRIPTION: "hey um so i just wanted to like follow up on the meating from yesterday i think we should definately move the dedline to next friday becuz the desine team still needs more time to finish the mock ups and um yeah let me know if that works for you ok thanks"

Then your response would be ONLY the cleaned up text, so here your response is ONLY:
"Hey, I just wanted to follow up on the meeting from yesterday. I think we should definitely move the deadline to next Friday because the design team still needs more time to finish the mockups. Let me know if that works for you. Thanks."

FAQ

Why does this use Groq instead of a local transcription model?

I love this idea, and originally planned to build FreeFlow using local models, but to have post-processing (that's where you get correctly spelled names when replying to emails / etc), you need to have a local LLM too.

If you do that, the total pipeline takes too long for the UX to be good (5-10 seconds per transcription instead of <1s). I also had concerns around battery life.

Some day!

Update: You can now use a custom model with FreeFlow by configuring the LLM API URL in the FreeFlow settings to use Ollama. Thank you @taciturnaxolotl!

License

Licensed under the MIT license.