Running Local Models with LM Studio

LM Studio lets you download and run open-source language models entirely on your own machine. It exposes an OpenAI-compatible REST API, so you can reuse the same openai Python SDK you already know. This notebook demonstrates two approaches: using the OpenAI client pointed at your local server and using the native lmstudio Python package.

THIS CODE WONT EXECUTE IN GOOGLE COLAB. You need to get lmstudio server running and then run this in VS Code or any other local IDE!

Importing Libraries¶

We import both the lmstudio SDK (for native access) and the openai SDK (for the OpenAI-compatible endpoint). Having both available lets you pick whichever interface fits your workflow.

import lmstudio as lms
## !pip install openai
from openai import OpenAI
import os

Checking the Working Directory¶

Printing the current working directory is a quick sanity check to confirm where any local files will be read from or written to.



# Get the current working directory
cwd = os.getcwd()
print("Current Working Directory:", cwd)

Calling LM Studio via the OpenAI-Compatible Endpoint¶

By pointing the OpenAI client at http://127.0.0.1:1234/v1, every call goes to LM Studio instead of OpenAI’s servers. The api_key value is ignored by LM Studio but required by the SDK. Make sure you have a model loaded in LM Studio before running this cell.



# Initialize the client pointing to your local LM Studio server
client = OpenAI(
    base_url="http://127.0.0.1:1234/v1", # Ensure you add "/v1" to the URL
    api_key="lm-studio" # The SDK requires a key, but LM Studio ignores the specific value
)

response = client.chat.completions.create(
    model="gpt-oss-20B", # Ensure this matches the model loaded in LM Studio
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."},
    ],
    temperature=0.7,
)

print(response.choices[0].message.content)

Another example

Using the Native LM Studio SDK¶

The lmstudio package connects to whichever model is currently loaded and provides a simpler respond() interface. This is convenient for quick experiments where you do not need fine-grained control over parameters.

model = lms.llm()

# 2. Write your prompt
prompt = "What is today's date?"

# 3. Get the response
print("Awaiting response...")
result = model.respond(prompt)

print(f"\nLLM Says: {result}")

Key takeaways¶

LM Studio downloads and runs open-source LLMs entirely on your own machine, so prompts and documents never leave your device.
The OpenAI SDK works unchanged once you set base_url="http://127.0.0.1:1234/v1"; the api_key is required syntactically but ignored by LM Studio.
Two SDK options: the OpenAI-compatible client for portability, or the native lmstudio package with a lightweight model.respond() call.
Local execution requires running this notebook in a local IDE (not Colab) with a model already loaded in the LM Studio desktop app.
Privacy and cost: local inference has zero per-token cost and keeps sensitive data on-premises, at the price of hardware and slower throughput.

Run the code¶

To run this notebook, copy the URL below into your browser’s address bar. The link opens the notebook directly in Google Colab. (If your PDF viewer makes the URL clickable and lands on a broken page, copy the full text manually -- the viewer may have truncated the link at a line break.)

Estimated run time: ~5 minutes (requires LM Studio running locally)

https://colab.research.google.com/github/KarAnalytics/code_demos/blob/main/LMStudioAccess.ipynb