Repository context for LLM assisted code completion

Meng Zhang

October 16, 2023

•

-minute read

Using a Language Model (LLM) pretrained on coding data proves incredibly useful for "self-contained" coding tasks, like conjuring up a completely new function that operates independently 🚀.

However, employing LLM for code completion within a vast and intricate pre-existing codebase poses certain challenges 🤔. To tackle this, LLM needs to comprehend the dependencies and APIs that intricately link its subsystems. We must provide this "repository context" to LLM when requesting it to complete a snippet.

To be more specific, we should:

Aid LLM in understanding the overall codebase, allowing it to grasp the intricate code with dependencies and generate fresh code that utilizes existing abstractions.
Efficiently convey all of this "code context" in a manner that fits within the context window (~2000 tokens), keeping completion latency reasonably low.

To demonstrate the effectiveness of this approach, below is an example showcasing TabbyML/StarCoder-1B performing code completion within Tabby's own repository.

start_heartbeat(args);
Server::bind(&address)
    .serve(app.into_make_service())
    .await
    .unwrap_or_else(|err| fatal!("Error happens during serving: {}", err))
}

fn api_router(args: &ServeArgs) -> Router {
    let index_server = Arc::new(IndexServer::new());
    let completion_state = {
        let (
            engine,
            EngineInfo {
                prompt_template, ..
            },
        ) = create_engine(&args.model, args);
        let engine = Arc::new(engine);
        let state = completions::CompletionState::new(
            ║
    }

‍

Without access to the repository context, LLM can only complete snippets based on the current editor window, generating a wrong function call to CompletionState::new

fn api_router(args: &ServeArgs) -> Router {
        ...
        let engine = Arc::new(engine);
        let state = completions::CompletionState::new(
            engine,
            prompt_template,
        );
        Arc::new(state);
        ...
}

However, with the repository context (Specifically, if we include the entire file ofcrates/tabby/src/serve/completions.rs into the prompt).

// === crates/tabby/serve/completions.rs ===
// ......
// ......

‍

We can generate a snippet that properly calls CompletionState::new (with the second parameter being index_server.clone()).

fn api_router(args: &ServeArgs) -> Router {
        ...
        let engine = Arc::new(engine);
        let state = completions::CompletionState::new(
            engine,
            index_server.clone(),
            prompt_template,
        );
        Arc::new(state);
        ...
}

The Problem: Repository Context

One obvious solution is to pack the whole codebase into LLM with each completion request. Voila✨! LLM has all the context it needs! But alas, this approach falls short for even moderately sized repositories. They're simply too massive to squeeze into the context window, causing a slowdown in inference speed.

A more efficient approach is to be selective, hand-picking the snippets to send. For instance, in the example above, we send the file containing the declaration of the CompletionState::new method. This strategy works like a charm, as illustrated in the example.

However, manually pinpointing the right set of context to transmit to LLM isn't ideal. Plus, sending entire files is a bulky way to relay code context, wasting the precious context window. LLM doesn't need a grand tour of the complete completion.rs, only a robust enough understanding to utilize it effectively. If you continually dispatch multiple files' worth of code just for context, you'll soon hit a wall with the context window limit.

Code snippet to provide context.

In the v0.3.0 release, we introduced Retrieval Augmented Code Completion, a nifty feature that taps into the repository context to enhance code suggestions. Here's a sneak peek of a snippet we pulled from the repository context:

// Path: crates/tabby/src/serve/completions.rs
// impl CompletionState {
//     pub fn new(
//         engine: Arc<Box<dyn TextGeneration>>,
//         index_server: Arc<IndexServer>,
//         prompt_template: Option<String>,
//     ) -> Self {
//         Self {
//             engine,
//             prompt_builder: prompt::PromptBuilder::new(prompt_template, Some(index_server)),
//         }
//     }
// }
//
// Path: crates/tabby/src/serve/mod.rs
// Router::new()
//         .merge(api_router(args))

‍

By snagging snippets like this, LLM gets to peek into variables, classes, methods, and function signatures scattered throughout the repo. This context allows LLM to tackle a multitude of tasks. For instance, it can cleverly decipher how to utilize APIs exported from a module, all thanks to the snippet defining / invoking that API.

Use tree-sitter to create snippets

Tabby, under the hood, leverages 🌳 Tree-sitter query to construct its index. Tree-sitter is capable of scanning source code written in various languages and extracting data about all the symbols defined in each file.

Historically, Tree-sitter was utilized by IDEs or code editors to facilitate the creation of language formatters or syntax highlighters, among other things. However, we're taking a different approach and using Tree-sitter to aid LLM in understanding the codebase.

Here's an example of the output you'll get when you run following query on go source code:

(type_declaration (type_spec name: (type_identifier) @name)) @definition.type

‍

type payload struct {
	Data string `json:"data"`
}
...

These snippets are then compiled into an efficient token reverse index for use during querying. For each request, we tokenize the text segments and perform a BM25 search in the repository to find relevant snippets. We format these snippets in the line comment style, as illustrated in the example above. This format ensures it doesn't disrupt the existing semantics of the code, making it easy for LLM to understand.

Roadmap

The current approach to extracting snippets and performing ranking is relatively simple. We're actively working on various aspects to fully iterate through this approach and elevate its efficiency and effectiveness:

Snippet Indexing: We are aiming to achieve a detailed understanding of what snippets should be incorporated into the index for each programming language. 📚
Retrieval Algorithm: Our focus is on refining the retrieval algorithm using attention weight heatmaps. Ideally, snippets with higher attention weights from Language Models (LLMs) should be prioritized in the retrieval process. ⚙️

We are incredibly enthusiastic about the potential for enhancing the quality and are eager to delve deeper into this exciting development! 🌟

Give it a try

To use this repository context feature:

Installing tabby.
Navigate to the Repository Context page and follow the instructions to set it up.

Share this post

tech design

repository context

Stay Updated with Tabby News

Subscribe to our newsletter for the latest updates and news about Tabby.

Thank you! We've received your submission.

Oops! Something went wrong. Please try again.

Stream laziness in Tabby

September 30, 2023

▪︎

min read

Decode the Decoding in Tabby

October 21, 2023

▪︎

min read

Cracking the Coding Evaluation

November 13, 2023

▪︎

min read

[ View our Full blog ]

Discover Tabby Unlock Your Coding Potential

Explore the Power of Tabby, the Self-Hosted AI Coding Assistant

                                                                                                             
                                                                                                             
                                                                                                             
                                                                                                             
333                                                                            333333                        
444   7                                                                       66466                          
00   313333                                                                 0000                             
   55555                                                                                                  331
  666                                                                                                    444 
888       777777                                                                                        888  
0       3311                                                                                            0    
    222222                                                                                                   
  455555         77777777                                                                                    
 666664       1111117                                                                                        
999999     3333333                                                                                    7      
8888     2222222   77777777                                                                    777           
000    5555555   1111111                                                                     33333           
0   4444444    1111113                                                                     55555             
  6666666    2333333                                                                      66664              
 999999    2222222                           7                                           8888                
888888    555555      77777777     77777777                                              00                  
0000   44444444     77777777    177777777                                                                    
00  4444444446   111111111    11111111                                                                       
0  666666666   13313131    3331313        777777                                                             
 999999999    333333     3333333        777777                                                               
8888888      222223    2222222        1111117                                                                
0000       222222     2222225      11111111                                                                  
000     55555555    5555555     333333333    7                                                               
0      5444444   544444445    3333322                                                                        
      444444   4444444444  2222222                                                             7             
      6666  66666666666 5552522   7 777   777 77    7 7   7                                  7               
     6666 66666666666  55555  7777777777  7777     7777   7777       7 7777       7 77 77 7      7 7   777   
   999999999999999    4444  7777777777 777777    177777  777777     777777  7777777777        7777 7777777   
 88898889888898     4444 111111111   111111     1111111  77777    777777 1777777777         717777777777     
88888888888        666 11111111    111111     11111111  11111    711111111111111          1111111111111      
0000000          666 3131313     3131313     11313133  11113    111111111111           111111111111111       
00000         9999  333333      3333333    33333333   13333   3333333333         333333333333333311          
000        999999  33333      3333333     333333     3333    3333333          33333333333333                 
0       8888888  222222    222222222    222222     3232    32332            3323232323223                    
    88888880    22222    222222222   2222222     2222    2222             2222222222222                      
 00000000     555555    55555555  255555552    2222    22222            222222222222                         
000000       55555    55555555 5555555555    5555   255555            5555555555552                          
0000       555555    555555555555555554    5555   55555555          555555555555                             
00      4444444     44444444444444445    44444 4444444445         44444444445                                
      44444444     444444444444444      444444444444 4444       4444444444                                   
    46666664      66666666666664       44444444644   444      46444444                                       
 66666666        6666666666666        666666666    6666      666666                                          
6969666        66969696969696       96666666     66666      66666     777777777    777777                    
99999        99999999999999       9999999      999999      99999     111111113    11111                      
99         999999999999        9999999        999999      99999    333333333     33333                       
         888888888      9988888888898       8888899     888889    22222222      22222   77777777             
      888888888      88888888888888      8888888     88888888   55555555       5555    111111                
    088888888      0888888888888       088888      88888888   444444444      44444    22222                  
  000000000      00000000000        000000      0000000000  666666666      66666    55555      1111111       
0000000000     000000000     00000000000    000000000000   99999999      999999    66666     2222222         
0 000000    000000000   0000000000000    0000000000000    8888888      888888   9999999    4444444           
000000    00000000    0000000000000   00000000000000     00000      0000000   0000000    89999998         7  
000    00000000     0000000000000   000000000000000    00000       000000   00000000    000000   9999999

Get Started with our Community Plan Today

Get Started

Simple self-onboarding

Free community plan

Local-first deployment



  
333                                                                            333333                        
444   7                                                                       66466                          
00   313333                                                                 0000                             
   55555                                                                                                  331
  666                                                                                                    444 
888       777777                                                                                        888  
0       3311                                                                                            0    
    222222                                                                                                   
  455555         77777777                                                                                    
 666664       1111117                                                                                        
999999     3333333                                                                                    7      
8888     2222222   77777777                                                                    777           
000    5555555   1111111                                                                     33333           
0   4444444    1111113                                                                     55555             
  6666666    2333333                                                                      66664              
 999999    2222222                           7                                           8888                
888888    555555      77777777     77777777                                              00                  
0000   44444444     77777777    177777777                                                                    
00  4444444446   111111111    11111111                                                                       
0  666666666   13313131    3331313        777777                                                             
 999999999    333333     3333333        777777                                                               
8888888      222223    2222222        1111117                                                                
0000       222222     2222225      11111111                                                                  
000     55555555    5555555     333333333    7                                                               
0      5444444   544444445    3333322                                                                        
      444444   4444444444  2222222                                                             7             
      6666  66666666666 5552522   7 777   777 77    7 7   7                                  7               
     6666 66666666666  55555  7777777777  7777     7777   7777       7 7777       7 77 77 7      7 7   777   
   999999999999999    4444  7777777777 777777    177777  777777     777777  7777777777        7777 7777777   
 88898889888898     4444 111111111   111111     1111111  77777    777777 1777777777         717777777777     
88888888888        666 11111111    111111     11111111  11111    711111111111111          1111111111111      
0000000          666 3131313     3131313     11313133  11113    111111111111           111111111111111       
00000         9999  333333      3333333    33333333   13333   3333333333         333333333333333311          
000        999999  33333      3333333     333333     3333    3333333          33333333333333                 
0       8888888  222222    222222222    222222     3232    32332            3323232323223                    
    88888880    22222    222222222   2222222     2222    2222             2222222222222                      
 00000000     555555    55555555  255555552    2222    22222            222222222222                         
000000       55555    55555555 5555555555    5555   255555            5555555555552                          
0000       555555    555555555555555554    5555   55555555          555555555555                             
00      4444444     44444444444444445    44444 4444444445         44444444445                                
      44444444     444444444444444      444444444444 4444       4444444444                                   
    46666664      66666666666664       44444444644   444      46444444                                       
 66666666        6666666666666        666666666    6666      666666                                          
6969666        66969696969696       96666666     66666      66666     777777777    777777                    
99999        99999999999999       9999999      999999      99999     111111113    11111                      
99         999999999999        9999999        999999      99999    333333333     33333                       
         888888888      9988888888898       8888899     888889    22222222      22222   77777777             
      888888888      88888888888888      8888888     88888888   55555555       5555    111111                
    088888888      0888888888888       088888      88888888   444444444      44444    22222                  
  000000000      00000000000        000000      0000000000  666666666      66666    55555      1111111       
0000000000     000000000     00000000000    000000000000   99999999      999999    66666     2222222         
0 000000    000000000   0000000000000    0000000000000    8888888      888888   9999999    4444444           
000000    00000000    0000000000000   00000000000000     00000      0000000   0000000    89999998         7  
000    00000000     0000000000000   000000000000000    00000       000000   00000000    000000   9999999

Explore Full Features with Team or Enterprise Plans

BOOK A DEMO 🚀

Enterprise-first experience

Flexible deployment options

Enhanced security support