quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, )
model = AutoModelForCausalLM.from_pretrained( "completetinymodelraven_top", quantization_config=quant_config, device_map="auto", trust_remote_code=True # Required for Raven architecture )
tokenizer = AutoTokenizer.from_pretrained("completetinymodelraven_top") completetinymodelraven top
Solution: The "Top" version precomputes positional encodings on first load. This is normal. Subsequent runs will be fast.
The 8k context window is rare for a "tiny" model. Network routers or Raspberry Pi clusters can use the model to summarize thousands of lines of log data without sending sensitive IP addresses to the cloud. The 8k context window is rare for a "tiny" model
"In twilight's hush, where shadows play
Amidst the whispers of a dying day
The raven's call, a mystic's sigh
Echoes through, a lonely sky
With eyes like jewels, dark and bright
It watches worlds, in endless night
A symbol of mystery, a bird of might
The raven's wisdom, a guiding light Here is how to make your "complete" model stand out:
In completion of the cycle, it stands
A sentinel of mystic lands
A completion model, of secrets untold
The raven's wisdom, forever to hold."
If you’ve picked up the top-rated Raven model, painting it can be a joy due to its size. Here is how to make your "complete" model stand out: