Javascript Benchmarking Local LLMs using Llamacpp / Koboldcpp
This is the output from the current prompts.
Ready to unleash the power of your local Large Language Models? 🔥
This project provides a powerful and flexible Python suite to systematically benchmark multiple .gguf language models running locally via the fantastic KoboldCpp backend. Pit your models against each other using your custom prompts (especially geared towards JavaScript generation in this setup!) and see how they perform head-to-head on your hardware!
Stop guessing, start measuring! 📊
.gguf files and .md prompts./api/extra/generate/check) to capture results even from long-running generations.model_payload_filter) or add specific instructions to prompts (model_prompt_filter) based on the model being tested..md file in the results directory. Includes generation time appended as an HTML comment (<!-- 123.45s -->).extract_html.py) to automatically find and extract <!DOCTYPE html>...</html> blocks from your result files into separate, viewable .html files – perfect for checking generated web pages!run_benchmarks.py..gguf models (within size limits) and .md prompt files..md file in the results directory.extract_html.py to scan the results folder and pull out any complete HTML blocks into .html files for easy browser viewing..gguf format LLM files you want to benchmark..md files containing the prompts you want to test. (This setup is particularly focused on prompts designed to elicit JavaScript code).git clone https://github.com/electricazimuth/LocalLLM_VisualCodeTest.git # Replace with your repo URL
cd LocalLLM_VisualCodeTest
config.py ❗: Update the paths for your setup. Open config.py in a text editor and carefully update the following paths and settings near the top of the file:
KOBOLDCPP_SCRIPT: Absolute path to your koboldcpp.py script.MODEL_DIR: Absolute path to the directory containing your .gguf models.PROMPT_DIR: Absolute path to the directory containing your .md prompt files.RESULTS_DIR: Path where the benchmark results (.md files) will be saved.KOBOLDCPP_ARGS: Crucial! Adjust these arguments for your hardware and KoboldCpp setup.
--usecublas (or --useclblast, etc.) and GPU layer settings (--gpulayers). Ensure the --port matches the API_URL.MAX_SIZE_BYTES / MIN_SIZE_BYTES: Filter models by file size if needed.API_PAYLOAD_TEMPLATE: Modify default generation parameters (temperature, top_p, max_length, etc.) if desired.SERVER_STARTUP_WAIT: Increase if your models take longer to load.PRIMARY_API_TIMEOUT: Increase if you expect very long generation times..gguf files are in the MODEL_DIR and your .md prompt files are in the PROMPT_DIR.python run_benchmarks.py --backend llamacpp
nohup (on Linux/macOS) to prevent the process from stopping if you close the terminal:
nohup python run_benchmarks.py > runbench.log 2>&1 &
This will run the script in the background and log all output to runbench.log. You can monitor the log using tail -f runbench.log.
.md files in your results directory (ensure this directory exists and contains results).extract_html.py is configured correctly (the SOURCE_FOLDER_NAME should match your RESULTS_DIR name, default is “results”).python extract_html.py
results directory – you should now see corresponding .html files for any markdown files that contained valid <!DOCTYPE html>...</html> blocks. Open them in your browser!static_viewer.php.md files in the directory specified by RESULTS_DIR.model-stem_prompt-stem_timestamp[_fallback].md
model-stem: Name of the model file (without extension).prompt-stem: Name of the prompt file (without extension).timestamp: Date and time of generation (YYYYMMDD_HHMMSS)._fallback (Optional): Indicates the result was obtained using the fallback API call after a timeout.<!-- [TIME]s --> (e.g., <!-- 15.23s -->) is appended to the end of the generated content, indicating the time taken for the API generation request (or time until timeout).run_benchmarks.py script (search for MAX_SIZE_BYTES, MIN_SIZE_BYTES, and commented-out name filters) to include/exclude specific models based on name patterns or size.model_payload_filter and model_prompt_filter functions in run_benchmarks.py to tweak API parameters or add instructions tailored to specific models (e.g., adjusting temperature for ‘qwen’ models as shown in the example).Happy Benchmarking! 🎉