Guides: AI Toolkit for Textual Analysis: Use Existing AI Models

Existing Generative Models

Using Existing Generative AI Models (LLMs)

Generative AI models, particularly Large Language Models (LLMs) like those powering ChatGPT, Copilot, and Gemini, are amazing at creating human-like text. Beyond content generation, their advanced reasoning and instruction-following abilities allow them to assist powerfully with complex analytical tasks when guided through specific instructions (prompts).

The Value of Using Existing Generative Models

Why use pre-built LLMs? Training these massive models requires enormous datasets and computational power, making it impractical for most researchers. By using existing models, you leverage:

State-of-the-Art Capabilities: Access to models representing the forefront of AI language understanding and generation.
Immediate Accessibility: Ready-to-use models available via web interfaces or programming interfaces (APIs).
Broad Knowledge Base: Models trained on vast amounts of text data (though verification is always needed!).
Versatility: A single model can often be adapted to numerous tasks through prompting.

Dual Capabilities in Research

Existing generative models can support textual research workflow in multiple ways:

Content Creation: Brainstorming ideas, drafting text sections, creating summaries, generating code snippets, translating languages, rephrasing content.
Analysis Assistance: Answering detailed questions about your provided text/data, extracting specific information based on natural language descriptions, performing classification or sentiment analysis via instructed prompts, explaining complex concepts, assisting in data interpretation or identifying potential patterns (especially powerful when accessed programmatically via APIs for specific analysis goals).

Prompt Engineering

Effectively using generative models relies on prompt engineering. This is the process of crafting clear, detailed, and context-rich instructions (prompts) to guide the AI toward your desired output. Achieving optimal results often involves experimenting and refining prompts.

Common Ways to Access Models

Web Interfaces: Platforms like ChatGPT, Copilot, Gemini, Perplexity offer user-friendly chat environments.
APIs (Application Programming Interfaces): Provided by companies like OpenAI, Anthropic, Google, etc., allowing programmatic access for more complex workflows, automation, and integration into your own scripts (this guide will focus on APIs for analysis later).
Hosted/Downloadable Models: Some models (especially open-source ones via Hugging Face) can be accessed or even run locally, though this often has significant hardware requirements.

Critical Considerations (Important!)

Accuracy/Hallucinations: Models can generate incorrect or nonsensical information that sounds convincing. Fact-checking is essential.
Bias: Models reflect biases present in their vast training data. Critically evaluate outputs for fairness.
Ethical Use: Adhere to academic integrity standards regarding plagiarism and proper attribution. Understand appropriate use cases.
Data Privacy: Be cautious about inputting sensitive or proprietary information, especially into third-party services. Review privacy policies.
Cost: While some access is free, extensive API use or premium web access typically incurs costs.

Finding the Right Generative Model, Platform, or API

A growing number of powerful generative models (LLMs) are available, each with different strengths, weaknesses, access methods (web interface vs. API), and costs. This section helps you navigate the landscape to find models and services suitable for your research needs, whether for content creation or analysis assistance (particularly via APIs).

Major AI Providers & Platforms (Web UI & API)

These companies offer leading proprietary models, usually accessible via both polished web interfaces and APIs crucial for programmatic use. While there may exist free tiers for testing models, leveraging these models typically requires pay-as-you-go funding.

Prominent Open & Open-Weight Models

These models often have their weights (or full details) released publicly, offering potential for transparency, customization, or local hosting (hardware permitting). They are frequently accessed via platforms like Hugging Face or specific APIs.

Aggregator Tools & Platforms

These tools provide access to or comparisons of multiple models.

How to Choose: Key Selection Criteria

Performance & Task Suitability: Which model excels at your specific needs (e.g., coding, creative writing, complex reasoning, following instructions for analysis)? Consult benchmarks but testing is crucial.
API Access & Cost (Crucial for Programmatic Analysis): If using APIs: Is one available? How is it priced (e.g., per input/output token)? What are the rate limits? How good is the documentation? Compare costs across commercial providers. How does this compare to working with an open-source models?
"Openness": Consider if the transparency or customization potential of open models aligns with your needs versus the potentially higher performance or easier access of leading proprietary models.
Context Window Length: How much text (prompt + response) can the model handle? Critical for tasks involving long documents. (Ranges vary widely, check provider specs).
Web Interface Features: If using a GUI, consider its usability, features like file uploads, web Browse capabilities, conversation history management.
Data Privacy & Terms of Service: Essential. Review the policies of the provider, especially when submitting research data via web UI or API.

Programmatic Analysis with Leading Commercial APIs

This section details how to leverage leading commercial LLMs (like OpenAI's GPT models, Anthropic's Claude, Google's Gemini) for sophisticated textual analysis by interacting directly with their Application Programming Interfaces (APIs) using Python. While web interfaces are great for exploration, using APIs allows researchers to:

Automate analysis tasks across many documents.
Batch process large datasets efficiently.
Integrate AI capabilities directly into your custom research scripts and workflows.
Gain fine-grained control over model parameters and receive structured output.

The Core Python API Workflow - A General Pattern

Although specifics vary, interacting with most modern LLM APIs via Python generally involves these steps:

Get API Credentials: Sign up with the provider and obtain your secret API key.
Install Provider Library: Use pip to install the official Python package for the service (e.g., pip install openai, pip install anthropic, pip install google-generativeai).
Import & Initialize Client: Import the necessary classes/functions from the library and create an authenticated client object, usually passing your API key during initialization.
Construct Your Request:
- Define the prompt: Often structured using a "messages" format (with roles like system, user) to provide context and instructions.
- Specify the model: Choose the desired model ID (e.g., "gpt-4o", "claude-3-sonnet-20240229", "gemini-1.5-pro-latest").
- Set parameters: Control the output (e.g., max_tokens, temperature for creativity, top_p).
Send the API Call: Use the client object's appropriate method to send the request to the API endpoint (e.g., client.chat.completions.create(...)).
Receive & Parse Response: Handle the response returned by the API, which is typically a JSON object containing the model's generated text and other metadata. Extract the needed information.
Error Handling: Include code to gracefully handle potential network issues or API errors.

Simple Example Python Script to Work with OpenAI's ChatGPT via API

# Conceptual Example - Check OpenAI Docs for current syntax & best practices!
# WARNING: Hardcoding keys like this is insecure! Use secure methods instead. Hardcoding is demonstrated here for ease of testing.

from openai import OpenAI

# Replace "YOUR_OPENAI_API_KEY" with your actual key string
client = OpenAI(
api_key="YOUR_OPENAI_API_KEY"
)

try:
    response = client.chat.completions.create(
      model="gpt-4o", # Or another available model
      messages=[
        {"role": "system", "content": "You are a helpful research assistant."},
        {"role": "user", "content": "Analyze the sentiment of this text: [Specify Text Here]"}
      ],
      max_tokens=150
    )
    analysis_result = response.choices[0].message.content
    print("Analysis Result:", analysis_result)
except Exception as e:
    print(f"An API error occurred: {e}")

Running Open Generative Models For Analysis Locally

Introduction - The Local Approach

Beyond cloud-based web interfaces and APIs, you can run many powerful generative models (LLMs) directly on your own computer. Applications like Jan.ai, Ollama, LM Studio, and GPT4All make this possible. This approach offers significant benefits for data privacy (your text never leaves your machine), eliminates API costs, and allows for offline use. This section focuses on using Jan.ai due to its user-friendly interface on Mac, Windows, and Linux, while also mentioning alternatives.

Why Consider Running Models Locally?

Data Privacy: Essential when working with sensitive or confidential textual data.
No API Fees: Running models locally is free (aside from your electricity and hardware investment).
Offline Capability: Use models even without an internet connection (once models are downloaded).
Flexibility: Easily experiment with various open-source models.
Key Trade-off: Performance is entirely dependent on your own computer's hardware (see the Hardware Considerations tab). Not all models (especially the largest proprietary ones) can be run locally easily.

Chat & API Access Locally

To run a Generative LLM on a local computer requires software to serve as the interface between the LLM and the researcher. Free applications such as Jan.ai, Ollama, LM Studio, and GPT4All all serve this purpose and provide support for local chat (similar to chatting with LLMs online) and also local APIs. A local API, unlike web APIs, is accessible only on the local computer.

Jan.ai Example: Working with a Local API

Jan.ai supports unlimited, private, and free local API support when working with open GenAI models.See here for documentation on the Local API Server.

Here is a sample Python template you can use to construct your API script.

# -*- coding: utf-8 -*-
"""
Simple Python script template for interacting with a Jan.ai local API server
using the 'requests' library, assuming OpenAI API compatibility.
"""

import requests
import json # To parse the JSON response

# --- Configuration ---
# IMPORTANT: Verify these settings in your Jan application!

# 1. Server URL: Get this from Jan's Settings -> Server section.
# Ensure the Jan server is running before executing this script.
SERVER_URL = "http://127.0.0.1:1337" # Common default - VERIFY!

# 2. Model ID: Get this from Jan's model list or server settings.
# It must match the ID Jan uses for the model you want to query.
MODEL_ID = "llama3-8b-instruct" # Example

# --- Request Details ---

# Headers: Typically just Content-Type is needed for local servers.
headers = {
"Content-Type": "application/json",
}

# Prompt: Use the standard OpenAI messages format.
messages = [
{"role": "system", "content": "You are a helpful assistant providing concise answers."},
{"role": "user", "content": "What are the main benefits of running LLMs locally?"}
]

# Parameters: Control the model's output.
# Support for these depends on Jan's server and the specific model runner.
payload = {
    "model": MODEL_ID,
    "messages": messages,
    "max_tokens": 150,        # Limit the length of the response
    "temperature": 0.7,       # Controls creativity (0.0=deterministic, >1.0=more creative)
    "top_p": 1.0,             # Nucleus sampling (1.0 disables it)
    # "stream": False,        # Set to True for streaming (requires different handling)
    # "stop": ["\n", " Human:"] # Optional sequences to stop generation
}

# --- Send Request & Process Response ---

print(f"--- Sending request to Jan server at {SERVER_URL} for model {MODEL_ID} ---")
api_endpoint = f"{SERVER_URL}/v1/chat/completions"

try:
# Make the POST request
response = requests.post(api_endpoint, headers=headers, json=payload, timeout=180)

# Check for HTTP errors (like 404 Not Found, 500 Internal Server Error)
response.raise_for_status()

    # Parse the JSON response body
    try:
        response_data = response.json()

        # Extract the main content (following OpenAI structure)
        if "choices" in response_data and len(response_data["choices"]) > 0:
            message = response_data["choices"][0].get("message", {})
            content = message.get("content")
            if content:
                print("--- Model Response ---")
                print(content.strip())
                print("-" * 20)
            else:
                print("Error: Could not find 'content' in response message.")
                print("Full Response JSON:", response_data)

            # Optional: Display token usage if provided
            if "usage" in response_data:
                print(f"Token Usage: {response_data['usage']}")

        else:
            print("Error: 'choices' field missing or empty in response.")
            print("Full Response JSON:", response_data)

    except json.JSONDecodeError:
        print("Error: Failed to decode JSON response.")
        print("Response Text:", response.text)

except requests.exceptions.ConnectionError as e:
    print(f"Connection Error: Failed to connect to {SERVER_URL}.")
    print("Please ensure the Jan server is running and the URL is correct.")
    # print(f"Details: {e}") # Uncomment for more detail
except requests.exceptions.Timeout:
    print("Error: The request timed out. The server might be slow or busy.")
except requests.exceptions.RequestException as e:
    # Catches other request errors like HTTPError (already raised by raise_for_status)
    print(f"An error occurred during the request: {e}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

print("--- Script Finished ---")

What Your Computer Needs to Run Generative Models Locally

Running generative models (LLMs) directly on a computer using tools like Jan.ai or Ollama offers great benefits for privacy and cost, but it depends heavily on your hardware. Performance and feasibility vary significantly based on your system's specifications and the specific model you choose. This section outlines the key hardware components to consider.

Key Factor - Model Size & Quantization

Model Parameters (e.g., 7B, 13B, 70B): Larger models (more parameters) generally have better reasoning capabilities but demand much more computational power and memory.
Quantization (Crucial for Local Use): To make running large models feasible on consumer hardware, models are often quantized. This process reduces the precision of the model's numerical data (weights), significantly shrinking the file size and, more importantly, the RAM and VRAM needed to run it.
- Look for models in formats like GGUF, which often include quantization levels in the filename (e.g., Q4_K_M, Q5_K_M, Q8_0). Lower numbers (like Q4) mean smaller size and lower resource use, but potentially slightly lower output quality compared to higher levels (like Q8 or unquantized models).
- Using quantized models is often essential to run models larger than 7 billion parameters on typical desktop or laptop hardware.

Essential Hardware Components

RAM (System Memory): Absolutely Critical. Your computer needs enough RAM to load the (often quantized) model weights plus run the operating system and the LLM application itself.
- Estimates (Highly Variable based on Quantization & Model):
  - Small Models (e.g., ~7B Q4/Q5): Minimum 16GB RAM strongly recommended (8GB might struggle).
  - Medium Models (e.g., ~13B Q4/Q5): Often require 16GB - 32GB+ RAM.
  - Large Models (e.g., 30B-70B+ Q4/Q5): Can easily require 32GB, 64GB, or even 128GB+ RAM.
GPU & VRAM (Graphics Card Memory): Highly Recommended for Speed.
- Acceleration: Running LLMs on a compatible GPU (Graphics Processing Unit) is significantly faster than using the CPU alone. Look for support for NVIDIA (CUDA) or Apple Silicon (Metal) in your chosen runner software (Jan, Ollama, LM Studio often support these).
- VRAM: This is the dedicated memory on your GPU. It's used to hold parts (or all) of the model for fast processing. More VRAM is better. If the model (or a significant portion) fits entirely in VRAM, performance is much higher. If VRAM is insufficient, the system will use slower system RAM, reducing speed.
- Estimates (Rough guide for quantized models):
  - ~7B models: Often usable with 8GB VRAM.
  - ~13B models: Benefit greatly from 12GB-16GB+ VRAM.
  - ~30B models: Often need 24GB+ VRAM.
  - ~70B models: Typically require 40GB-48GB+ VRAM (often needing high-end or multiple GPUs).
CPU (Processor): While less critical for inference speed if you have a good GPU doing the work, a reasonably fast multi-core CPU is still needed for overall system operation and parts of the calculation. It becomes very important if you are running in CPU-only mode (which will be slow for larger models).
Storage (SSD - Solid State Drive):
- Capacity: Model files (especially GGUF) are large, typically ranging from 4GB to over 70GB depending on size and quantization. Ensure you have sufficient free disk space.
- Speed: An SSD is strongly recommended over a traditional Hard Disk Drive (HDD). It dramatically reduces the time it takes to load the model into RAM/VRAM each time you start it.

Existing Analytical/Predictive Models

Using Existing Analytical & Predictive AI Models for Text Analysis

These AI tools are designed to analyze text content, helping to uncover patterns, classify content based on specific criteria, extract key information, or discover patterns and structures within your data. Unlike Generative models, their primary function is interpretation and analysis of existing text, not the creation of new content.

Leveraging pre-trained analytical or predictive models offers significant advantages for researchers. These advantages can include:

Save time & resources
State-of-the-Art
Low technical barrier
Research consistency

Existing analytical models are commonly used for such tasks as:

Text classification
Topic modeling
Named entity recognition (NER)
Text clustering
Similarity analysis

This section focuses on using these powerful pre-existing tools. Use the tabs above to guide you through:

Discovering existing models
How to run textual analysis
Hardware considerations

Finding the Right Analytical/Predictive Model for Your Task

This section highlights key platforms and considerations for discovering pre-trained analytical and predictive models relevant to textual research. Remember that models can vary in their capabilities, languages covered, performance, and ease of use.

Central Model Repositories

Hugging Face Model Hub: The largest and most popular hub, especially for state-of-the-art Transformer models used in NLP.
TensorFlow Hub
PyTorch Hub

Models within Specific Libraries & Frameworks

Some programming libraries come with their own sets of pre-trained models optimized for that framework.

spaCy: Provides pre-trained pipelines for core NLP tasks like Named Entity Recognition (NER), Part-of-Speech (POS) tagging, and dependency parsing in many languages.

How to Choose: Key Selection Criteria

Task alignment
Language support
Performance metrics
Model size & speed
Documentation quality
License

Importance of Model Cards

Most models on Hugging Face have a Model Card. These cards are amazingly useful as they typically contain:

Information about intended use cases
Data used for training (and potential biases)
Evaluation results
How to use the model (code snippets)
Ethical considerations

Applying Pre-trained Analytical/Predictive Models to Your Text

You've identified promising pre-trained models for your analysis task. Now, how do you actually use it with your text data? The process generally involves loading the model and applying it to your input. The specific steps vary depending on whether you use a dedicated tool/platform or a programming library.

Using Tools with Graphical User Interfaces (GUIs)

Many platforms, including web-based tools and some library-licensed resources (e.g., Gale Digital Scholar Lab), offer point-and-click interfaces to run common analyses using underlying pre-trained models. This is often the easiest way to get started.

Typical workflow:

Input Data: Upload your document(s) or paste text directly into the tool
Select Analysis: Choose the desired function (e.g., Sentiment Analysis, Topic Modeling, Named Entity Recognition, etc.)
Configure:Adjust any available parameters
Run
View/Export Results
Examples:
- Hugging Face Spaces
- Gale Digital Scholar's Lab
- Voyant Tools
- Lawrence Anthony Ant Tools
- NVivo (Auto-coding features)

Using Programming Libraries (focus on Python)

For greater flexibility, batch processing large datasets, or integrating analysis into custom scripts, using programming libraries like Hugging Face transformers or spaCy is common. This requires some coding, often in Python. Tools like Google Colab or Jupyter Notebooks provide accessible environments.

Typical Workflow

Setup: Install required libraries (often using pip)
Load Model: Write code to load the specific pre-trained model or pipeline you chose.
Prepare Data: Load text data into your script
Apply Model: Pass your text data to the loaded model/pipeline function
Process Output: Extract and interpret the results.

Workflow Examples:

Hugging Face Model Card for SamLowe/roberta-base-go_emotions
GeeksforGeeks: Named Entity Recognition (NER) using spaCy

Computing Resources for Analytical/Predictive AI Models

Running pre-trained analytical or predictive AI models can range from computationally trivial to very demanding. Understanding the potential hardware needs before you start can save time and frustration. Key factors include the model itself, the size of your dataset, and the complexity of your analysis.

Factors Influencing Resource Needs

Model Size and Architecture: Larger, more complex models (especially deep learning models like Transformers found on Hugging Face) generally require significantly more memory (RAM and GPU VRAM) and benefit from GPU acceleration. Smaller models or highly optimized pipelines (like many from spaCy) are less demanding.
Dataset Size: Processing larger text corpora naturally requires more RAM and takes longer. Analyzing millions of documents will require more resources than analyzing a few hundred.
Analysis Workflow: Running batch processes over many files or performing very complex multi-step analyses will generally require more resources than analyzing single documents interactively.

Hardware Components Explained

RAM (System Memory): Essential for loading the AI model and your data. Insufficient RAM is often the first bottleneck. Check model documentation if available; requirements can range from <1GB to 32GB+ just for the model. Recommendation: 16GB minimum for moderate AI work, 32GB+ often helpful.
GPU (Graphics Processing Unit): Crucial for accelerating deep learning models. An appropriate GPU with sufficient VRAM (GPU Memory, e.g., 8GB+) can make analyses orders of magnitude faster. Without a suitable GPU, many modern models are impractically slow.
CPU (Central Processing Unit): While less critical than GPUs for many deep learning tasks, a reasonably modern multi-core CPU still affects overall performance and is essential for tasks not accelerated by a GPU.

Where to Run Your Analysis - Options at Baylor

Your Personal Computer: Suitable for smaller models, smaller datasets, and initial testing.
Cloud Computing (e.g., Google Colab): Suitable for free or low-cost GPUs for learning, experimentation, and moderate-sized tasks. Great for running Python code in notebooks.
Design & Data Lab in Moody Library Garden Level: See AI Computing in the Moody Library in this guide.
Baylor's High-Performance Computing (HPC) Resources: Suitable for large datasets, large/complex models requiring significant RAM or powerful GPUs, long-running analyses, batch processing jobs.

Recommendation

Evaluate your specific model and data needs. Start with your local machine or Google Colab for exploration. If your analysis is slow, runs out of memory, or involves very large data, the Design & Data Lab or Baylor's HPC resources may be your best option.

AI Toolkit for Textual Analysis

Existing Generative Models

Existing Analytical/Predictive Models

University Libraries