# /var/tmp/portage/dev-cpp/llama-cpp-0_pre4676/work/llama.cpp-b4676_build-abi_x86_64.amd64/bin/test-tokenizer-0 /var/tmp/portage/dev-cpp/llama-cpp-0_pre4676/work/llama.cpp-b4676/tests/../models/ggml-vocab-deepseek-coder.gguf shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory main : reading vocab from: '/var/tmp/portage/dev-cpp/llama-cpp-0_pre4676/work/llama.cpp-b4676/tests/../models/ggml-vocab-deepseek-coder.gguf' ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: yes ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 4070 Laptop GPU, compute capability 8.9, VMM: yes register_backend: registered backend CUDA (1 devices) register_device: registered device CUDA0 (NVIDIA GeForce RTX 4070 Laptop GPU) ggml_vulkan: Found 1 Vulkan devices: ggml_vulkan: 0 = NVIDIA GeForce RTX 4070 Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | warp size: 32 | shared memory: 49152 | matrix cores: KHR_coopmat register_backend: registered backend Vulkan (1 devices) register_device: registered device Vulkan0 (NVIDIA GeForce RTX 4070 Laptop GPU) ggml_opencl: selecting platform: 'NVIDIA CUDA' ggml_opencl: selecting device: 'NVIDIA GeForce RTX 4070 Laptop GPU' Unsupported GPU: NVIDIA GeForce RTX 4070 Laptop GPU register_backend: registered backend OpenCL (0 devices) register_backend: registered backend CPU (1 devices) register_device: registered device CPU (13th Gen Intel(R) Core(TM) i7-13700HX) llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4070 Laptop GPU) - 7595 MiB free llama_model_load_from_file_impl: using device Vulkan0 (NVIDIA GeForce RTX 4070 Laptop GPU) - 8188 MiB free llama_model_loader: loaded meta data with 25 key-value pairs and 0 tensors from /var/tmp/portage/dev-cpp/llama-cpp-0_pre4676/work/llama.cpp-b4676/tests/../models/ggml-vocab-deepseek-coder.gguf (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = deepseek-coder llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 16384 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 32 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 100000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 10: general.file_type u32 = 1 llama_model_loader: - kv 11: llama.vocab_size u32 = 32256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: llama.rope.scaling.type str = linear llama_model_loader: - kv 14: llama.rope.scaling.factor f32 = 4.000000 llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 16: tokenizer.ggml.pre str = deepseek-coder llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,32256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,32256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,31757] = ["Ġ Ġ", "Ġ t", "Ġ a", "i n", "h e... llama_model_loader: - kv 20: tokenizer.ggml.bos_token_id u32 = 32013 llama_model_loader: - kv 21: tokenizer.ggml.eos_token_id u32 = 32014 llama_model_loader: - kv 22: tokenizer.ggml.padding_token_id u32 = 32014 llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 24: tokenizer.ggml.add_eos_token bool = false print_info: file format = GGUF V3 (latest) print_info: file type = F16 print_info: file size = 0.00 MiB (-nan BPW) init_tokenizer: initializing tokenizer for type 2 load: control-looking token: 32015 '<|fim▁hole|>' was not control-type; this is probably a bug in the model. its type will be overridden load: control-looking token: 32017 '<|fim▁end|>' was not control-type; this is probably a bug in the model. its type will be overridden load: control-looking token: 32016 '<|fim▁begin|>' was not control-type; this is probably a bug in the model. its type will be overridden load: control token: 32015 '<|fim▁hole|>' is not marked as EOG load: control token: 32014 '<|end▁of▁sentence|>' is not marked as EOG load: control token: 32017 '<|fim▁end|>' is not marked as EOG load: control token: 32016 '<|fim▁begin|>' is not marked as EOG load: control token: 32013 '<|begin▁of▁sentence|>' is not marked as EOG load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect load: special tokens cache size = 256 load: token to piece cache size = 0.1787 MB print_info: arch = llama print_info: vocab_only = 1 print_info: model type = ?B print_info: model params = 0.00 K print_info: general.name = deepseek-coder print_info: vocab type = BPE print_info: n_vocab = 32256 print_info: n_merges = 31757 print_info: BOS token = 32013 '<|begin▁of▁sentence|>' print_info: EOS token = 32014 '<|end▁of▁sentence|>' print_info: EOT token = 32014 '<|end▁of▁sentence|>' print_info: PAD token = 32014 '<|end▁of▁sentence|>' print_info: LF token = 185 'Ċ' print_info: FIM PRE token = 32016 '<|fim▁begin|>' print_info: FIM SUF token = 32015 '<|fim▁hole|>' print_info: FIM MID token = 32017 '<|fim▁end|>' print_info: EOG token = 32014 '<|end▁of▁sentence|>' print_info: max token length = 128 llama_model_load: vocab only - skipping tensors llama_init_from_model: n_seq_max = 1 llama_init_from_model: n_ctx = 512 llama_init_from_model: n_ctx_per_seq = 512 llama_init_from_model: n_batch = 512 llama_init_from_model: n_ubatch = 512 llama_init_from_model: flash_attn = 0 llama_init_from_model: freq_base = 0.0 llama_init_from_model: freq_scale = 1 llama_init_from_model: n_ctx_pre_seq (512) > n_ctx_train (0) -- possible training context overflow src: '' res: '' tok: src: ' ' res: ' ' tok: 184 src: ' ' res: ' ' tok: 184 185 src: ' ' res: ' ' tok: 185 src: ' ' res: ' ' tok: 185 185 src: ' ' res: ' ' tok: 185 185 185 src: ' 🚀 (normal) 😶‍🌫️ (multiple emojis concatenated) ✅ 🦙🦙 3 33 333 3333 33333 333333 3333333 33333333 3.3 3..3 3...3 កាន់តែពិសេសអាច😁 ?我想在apple工作1314151天~ ------======= нещо на Български ''''''```````""""......!!!!!!?????? I've been 'told he's there, 'RE you sure? 'M not sure I'll make it, 'D you like some tea? We'Ve a'lL' res: ' 🚀 (normal) 😶‍🌫️ (multiple emojis concatenated) ✅ 🦙🦙 3 33 333 3333 33333 333333 3333333 33333333 3.3 3..3 3...3 កាន់តែពិសេសអាច😁 ?我想在apple工作1314151天~ ------======= нещо на Български ''''''```````""""......!!!!!!?????? I've been 'told he's there, 'RE you sure? 'M not sure I'll make it, 'D you like some tea? We'Ve a'lL' tok: 185 207 185 185 207 185 185 185 207 12405 459 22758 185 243 185 315 185 251 185 730 185 10047 235 209 334 8760 8 12394 233 114 350 222 10047 221 104 169 116 224 334 4684 3909 992 24330 262 29651 612 8 207 156 237 214 12394 99 234 10047 99 234 207 18 207 18 18 207 18 18 18 207 18 18 18 18 207 18 18 18 18 18 207 18 18 18 18 18 18 207 18 18 18 18 18 18 18 207 18 18 18 18 18 18 18 18 207 18 13 18 207 18 524 18 207 18 1202 18 207 155 239 209 155 239 114 155 239 228 155 240 220 155 239 224 155 240 211 155 239 231 155 239 115 155 239 240 155 240 210 155 239 240 155 239 95 155 239 114 155 239 214 10047 233 210 3015 19100 608 9413 2668 16 18 16 19 16 20 16 1393 169 121 239 18155 374 17194 28 2861 6478 616 2251 14994 31269 4191 6 4686 4686 10252 3358 3358 3409 524 15330 3023 15031 5668 303 6 312 798 651 83 839 362 6 82 741 11 651 1369 340 2037 30 651 44 441 2037 303 6 642 1098 359 11 651 35 340 833 738 10860 30 998 6 10709 245 6 75 43 src: ' =' res: ' =' tok: 185 405 src: ' ' res: ' ' tok: 207 src: ' ' res: ' ' tok: 243 src: ' ' res: ' ' tok: 315 src: ' Hello' res: ' Hello' tok: 315 414 9489 src: ' Hello Hello' res: ' Hello Hello' tok: 315 414 9489 185 315 414 9489 src: ' Hello' res: ' Hello' tok: 243 414 9489 src: ' Hello' res: ' Hello' tok: 207 414 9489 src: ' (' res: ' (' tok: 334 src: ' Hello' res: ' Hello' tok: 414 9489 src: ' Hello World' res: ' Hello World' tok: 414 9489 5414 src: ' Hello World!' res: ' Hello World!' tok: 414 9489 5414 0 src: ' Hello world' res: ' Hello world' tok: 414 9489 1835 src: ' Hello, world!' res: ' Hello, world!' tok: 414 9489 11 1835 0 src: ' discards' res: ' discards' tok: 1607 2539 src: ' this is 🦙.cpp' res: ' this is 🦙.cpp' tok: 437 317 12394 99 234 13 14789 src: '!!!!!!' res: '!!!!!!' tok: 15330 3023 src: '' era' res: '' era' tok: 6 2895 src: '3' res: '3' tok: 18 src: '33' res: '33' tok: 18 18 src: '333' res: '333' tok: 18 18 18 src: '3333' res: '3333' tok: 18 18 18 18 src: '33333' res: '33333' tok: 18 18 18 18 18 src: '333333' res: '333333' tok: 18 18 18 18 18 18 src: '3333333' res: '3333333' tok: 18 18 18 18 18 18 18 src: '33333333' res: '33333333' tok: 18 18 18 18 18 18 18 18 src: '333333333' res: '333333333' tok: 18 18 18 18 18 18 18 18 18 src: 'Cửa Việt' res: 'Cửa Việt' tok: 34 155 119 242 64 24297 155 119 216 83 src: 'Führer' res: 'Führer' tok: 37 32009 71 6247 src: 'Hello' res: 'Hello' tok: 17535 src: 'Hello World' res: 'Hello World' tok: 17535 5414 src: 'Hello world' res: 'Hello world' tok: 17535 1835 src: 'Hello, world!' res: 'Hello, world!' tok: 17535 11 1835 0 src: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天~' res: 'Hello, y'all! How are you 😁 ?我想在apple工作1314151天~' tok: 17535 11 320 6 435 0 1717 417 340 12394 233 210 3015 19100 608 9413 2668 16 18 16 19 16 20 16 1393 169 121 239 src: 'ied 4 ½ months' res: 'ied 4 ½ months' tok: 1050 207 19 207 19192 4217 src: 'w048 7tuijk dsdfhu' res: 'w048 7tuijk dsdfhu' tok: 86 15 19 23 207 22 83 3963 27659 26078 3934 14072 src: 'нещо на Български' res: 'нещо на Български' tok: 1593 6478 616 2251 14994 src: 'កាន់តែពិសេសអាចខលចេញ' res: 'កាន់តែពិសេសអាចខលចេញ' tok: 155 239 209 155 239 114 155 239 228 155 240 220 155 239 224 155 240 211 155 239 231 155 239 115 155 239 240 155 240 210 155 239 240 155 239 95 155 239 114 155 239 214 155 239 210 155 239 236 155 239 214 155 240 210 155 239 218 src: '🚀 (normal) 😶‍🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token)' res: '🚀 (normal) 😶‍🌫️ (multiple emojis concatenated) ✅ (only emoji that has its own token)' tok: 10047 235 209 334 8760 8 12394 233 114 350 222 10047 221 104 169 116 224 334 4684 3909 992 24330 262 29651 612 8 207 156 237 214 334 5950 992 78 12896 344 638 891 1372 10736 8 Tests passed