PPIO 上线 DeepSeek-OCR-2 ,支持一键私有化部署

PPIO 上线 DeepSeek-OCR-2 ,支持一键私有化部署

PPIO 算力市场首发上线了 DeepSeek-OCR-2 部署模板,为开发者提供开箱即用的模型服务。

DeepSeek-OCR-2 是 DeepSeek 团队最新发布的开源 OCR 模型。与传统 OCR 方案不同,该模型引入了 DeepEncoder V2 视觉编码器,并采用了“视觉因果流(Visual Causal Flow)”技术。这一架构改变使得模型能够基于语义逻辑理解文档结构,从而在处理多栏排版、复杂表格以及图文混排场景时表现出更高的准确性。

同时,DeepSeek-OCR-2 优化了视觉 Token 的压缩效率,在保持高精度的前提下显著降低了计算开销,非常适合作为多模态大模型的前端输入或用于高精度文档数字化任务。

现在,你可以通过 PPIO 算力市场的 DeepSeek-OCR-2 模板,将模型一键部署在 GPU 云服务器上。无需复杂的环境配置,只需简单几步,即可拥有私有化的 DeepSeek-OCR-2 模型,快速验证业务效果。

项目地址:https://ppio.com/gpu-instance/console/explore?selectTemplate=select

#01 GPU 实例+模板,一键部署 DeepSeek-OCR-2

step 1: 子模版市场选择对应模板,并使用此模板。

step 2: 按照所需配置点击部署。

step 3: 检查磁盘大小等信息,确认无误后点击下一步。

step 4: 稍等一会,实例创建需要一些时间。

step 5: 在实例管理里可以查看到所创建的实例。

step 6: 点击启动 Web Terminal 选项,启动后点击连接选项即可连接到网页终端。

#02 如何使用

step 1:修改/vllm-workspace/DeepSeek-OCR-2/DeepSeek-OCR2-master/DeepSeek-OCR2-vllm目录下 config.py配置

/your/image/path/ 需要处理的图片目录

/your/output/path/ 解析后输出的目录

step 2:/vllm-workspace/DeepSeek-OCR-2/DeepSeek-OCR2-master/DeepSeek-OCR2-vllm目录执行脚本

$ python3 run_dpsk_ocr2_image.py                                                                                                                    
INFO 01-27 00:38:17 [__init__.py:239] Automatically detected platform cuda.                                             
INFO 01-27 00:38:21 [config.py:456] Overriding HF config with {'architectures': ['DeepseekOCR2ForCausalLM']}            
INFO 01-27 00:38:22 [config.py:717] This model supports multiple tasks: {'embed', 'reward', 'generate', 'score', 'classify'}. Defaulting to 'generate'.                                                                                         
INFO 01-27 00:38:22 [llm_engine.py:240] Initializing a V0 LLM engine (v0.8.5) with config: model='deepseek-ai/DeepSeek-OCR-2', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-OCR-2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=deepseek-ai/DeepSeek-OCR-2, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,                                                           
INFO 01-27 00:38:24 [cuda.py:292] Using Flash Attention backend.                                                        
INFO 01-27 00:38:24 [parallel_state.py:1004] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0      
INFO 01-27 00:38:24 [model_runner.py:1108] Starting to load model deepseek-ai/DeepSeek-OCR-2...                         
INFO 01-27 00:38:25 [config.py:3614] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248]                                                                
INFO 01-27 00:38:26 [weight_utils.py:265] Using model weights format ['*.safetensors']                                  
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]                                            
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 20.02it/s]                                    
                                                                                                                        
INFO 01-27 00:38:28 [loader.py:458] Loading weights took 1.26 seconds                                                   
INFO 01-27 00:38:28 [model_runner.py:1140] Model loading took 6.3336 GiB and 3.787022 seconds                           
Some kwargs in processor config are unused and will not have any effect: image_std, candidate_resolutions, downsample_ratio, mask_prompt, image_mean, sft_format, ignore_id, normalize, image_token, patch_size, add_special_token, pad_token.  
WARNING 01-27 00:38:33 [fused_moe.py:668] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=64,N=896,device_name=NVIDIA_GeForce_RTX_4090.json                                                                                                 
INFO 01-27 00:38:33 [worker.py:287] Memory profiling takes 5.08 seconds                                                 
INFO 01-27 00:38:33 [worker.py:287] the current vLLM instance can use total_gpu_memory (23.53GiB) x gpu_memory_utilization (0.75) = 17.65GiB                                                                                                    
INFO 01-27 00:38:33 [worker.py:287] model weights take 6.33GiB; non_torch_memory takes 0.08GiB; PyTorch activation peak memory takes 1.07GiB; the rest of the memory reserved for KV Cache is 10.16GiB.                                         
INFO 01-27 00:38:34 [executor_base.py:112] # cuda blocks: 11100, # CPU blocks: 4369                                     
INFO 01-27 00:38:34 [executor_base.py:117] Maximum concurrency for 8192 tokens per request: 21.68x                      
INFO 01-27 00:38:35 [model_runner.py:1450] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.                                    
Capturing CUDA graph shapes: 100%|██████████████████████████████████████████████████████| 35/35 [00:08<00:00,  4.21it/s]
INFO 01-27 00:38:44 [model_runner.py:1592] Graph capturing finished in 8 secs, took 0.16 GiB                            
INFO 01-27 00:38:44 [llm_engine.py:437] init engine (profile, create kv cache, warmup model) took 15.32 seconds         
INFO 01-27 00:38:44 [async_llm_engine.py:211] Added request request-1769503124.                                         
Some kwargs in processor config are unused and will not have any effect: image_std, candidate_resolutions, downsample_ratio, mask_prompt, image_mean, sft_format, ignore_id, normalize, image_token, patch_size, add_special_token, pad_token.  
<|ref|>text<|/ref|><|det|>[[163, 0, 456, 155]]<|/det|>                                                                  
By the lottery jaunty, the chances of the lottery combinations to work out the chances of other prizes, but it all starts to get a bit fiddly, we'll move on to something else. (How to work out the other lottery chances is just one of the amazing features you'll find at: www.murderousmaths.co.uk)                                                                
                                                                                                                        
<|ref|>sub_title<|/ref|><|det|>[[163, 168, 280, 199]]<|/det|>                                                           
## The disappearing sum                                                                                                 
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[160, 189, 456, 402]]<|/det|>                                                                
It's Friday evening. The lovely Veronica Gummfloss has been out with the football team who have all escorted her safely back to her doorstep. It's that tender moment when each hopeful player closes his eyes and leans forward with quivering lips. Unfortunately Veronica's parents heard them clumping down the road and Veronica knows she only has time to kiss four out of the eleven of them if she's going to do it properly.                                                          
                                                                                                                        
<|ref|>image<|/ref|><|det|>[[163, 408, 450, 666]]<|/det|>                                                               
                                                                                                                        
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[154, 645, 450, 744]]<|/det|>                                                                
How many choices has she got? It's  \( {}^{11}C_{4} \)  which is  \( {}^{11} \)  but for goodness sake DON'T reach for the calculator! The most brilliant thing about perms and                                                                 
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[500, 0, 808, 115]]<|/det|>                                                                  
comes is that the answer can never be a fraction which means that EVERYTHING ON THE BOTTOM ALWAYS CANCELS OUT! It's probably the best fun you'll ever have with a pencil so here we go...                                                       
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[512, 111, 792, 188]]<|/det|>                                                                
 \( 4! \times 7! \)                                                                                                     
                                                                                                                        
 \( 11! = 11 \times 10 \times 9 \times 8 \times 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1 \)              
                                                                                                                        
 \( 4 \times 3 \times 2 \times 1 \times 7 \times 6 \times 5 \times 4 \times 4 \times 3 \times 2 \times 1 \)             
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[499, 179, 804, 295]]<|/det|>                                                                
(Before we continue, grab this book and show somebody this sum. Rub their face on it if you need to and tell them that this is the sort of thing you do for fun without a calculator these days because you're so brilliant.)                   
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[499, 291, 803, 350]]<|/det|>                                                                
Off we go then. For starters we'll get rid of the 7! bit from top and bottom and get:                                   
                                                                                                                        
<|ref|>equation<|/ref|><|det|>[[612, 361, 696, 412]]<|/det|>                                                            
 \[ 11\times10\times9\times8 \]                                                                                         
                                                                                                                        
<|ref|>equation<|/ref|><|det|>[[614, 384, 691, 414]]<|/det|>                                                            
 \[ 4\times3\times2\times1 \]                                                                                           
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[496, 424, 801, 525]]<|/det|>                                                                
Pow! That's already got rid of more than half the numbers. Next we'll see that the  \( 4 \times 2 \)  on the bottom cancels out the 8 on top (and we don't need that  \( \times 1 \) " on the bottom either). We're left with...                
                                                                                                                        
<|ref|>equation<|/ref|><|det|>[[616, 541, 686, 592]]<|/det|>                                                            
 \[ \frac{11\times10\times9}{3} \]                                                                                      
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[495, 604, 800, 655]]<|/det|>                                                                
Then the 3 on the bottom divides into the 9 on top leaving it as a 3 so all we've got now is:                           
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[555, 669, 739, 694]]<|/det|>                                                                
Veronica's choices =  \( 11 \times 10 \times 3 \)                                                                       
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[493, 714, 600, 741]]<|/det|>                                                                
Look! No bottom.INFO 01-27 00:38:48 [async_llm_engine.py:179] Finished request request-1769503124.                      
                                                                                                                        
                                                                                                                        
INFO 01-27 00:38:48 [async_llm_engine.py:65] Engine is gracefully shutting down.                                        
===============save results:===============                                                                             
image: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 26886.56it/s]
other: 100%|████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 223101.28it/s]
[rank0]:[W127 00:38:49.576360410 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 

PPIO 的算力市场模板致力于帮助企业及个人开发者降低大模型私有化部署的门槛,无需繁琐的环境配置,即可实现高效、安全的模型落地。目前,PPIO算力市场已上线几十个私有化部署模板,除了 DeepSeek-OCR-2,你也可以将 AutoGLM-Phone-9B、 GLM-Image、PaddleOCR-VL 等模型快速进行私有化部署。

Read more

PPIO入选36氪“2026最具价值成长企业100”

PPIO入选36氪“2026最具价值成长企业100”

近日,PPIO 入选 36 氪发布的“2026 最具价值成长企业 100”榜单。 该榜单评选聚焦顺应产业变革、构建长期核心竞争力、持续创造商业与产业价值的创新企业,旨在发掘在新一轮科技变革与产业升级中脱颖而出的成长标杆。与 PPIO 同批入选的还有 DeepSeek、Kimi、阶跃星辰、穹彻智能等一批领先的人工智能/大模型公司。 PPIO 此次入选,体现了其在技术创新与商业化落地方面的综合实力,也反映出行业与市场对 AI 基础设施赛道发展前景的持续看好。 PPIO 是全球领先的 AI 云计算服务商,致力于帮助更多企业、开发者和智能体应用以更低成本、更高效率使用 AI 基础设施服务。 面向 AI 原生应用、智能体开发、AI 编程、多模态应用和企业级大模型服务等新一代需求,PPIO 提供高性价比、超弹性、低延迟的一站式 AI 云平台服务,覆盖

By luigi
香港特区政府财政司司长陈茂波一行到访 PPIO

香港特区政府财政司司长陈茂波一行到访 PPIO

6 月 18 日上午,香港特别行政区政府财政司司长陈茂波一行到访 PPIO 上海总部,参观 PPIO 企业展厅,并围绕人工智能基础设施建设、AI 云服务发展、产业生态协同以及香港创新科技发展等议题开展座谈交流。PPIO 联合创始人、董事长兼 CEO 姚欣等接待来访并参加座谈。 此次来访体现了香港特区政府对人工智能基础设施、分布式AI 云计算以及新一代 AI 服务平台发展的高度关注,也为 PPIO 进一步发挥自身技术与平台优势、深化香港布局、连接国际市场提供了重要交流契机。   来访期间,陈茂波司长一行参观了PPIO 展厅,详细了解 PPIO 的发展历程、技术架构、全球业务布局,以及公司在分布式 算力、模型推理服务、Agentic Cloud、智能体基础设施和全球开发者生态等方面的最新进展。   座谈会上,双方围绕AI产业发展趋势、算力基础设施建设、模型服务能力、AI 应用落地、香港国际化平台优势及未来合作方向等内容进行了深入交流。

By luigi
PPIO入选中国信通院Token服务能力攀登计划

PPIO入选中国信通院Token服务能力攀登计划

6 月 16 日,中国信通院正式发布“Token 服务能力攀登计划”。PPIO 凭借在 MaaS 模型服务性能、稳定性和 Token 输出效率方面的表现,入选首批企业级 Token 服务性能攀登基线。 在通用场景下,PPIO 模型服务实现 TPS ≥55 个/秒、TTFT ≤0.9 秒、调用成功率 ≥99.9%,标志着其模型平台已具备面向企业级 AI 应用和 Agent 场景的高质量 Token 服务能力。 PPIO 此次入选,体现了其在模型推理服务能力、AI 云基础设施建设和企业级 Token 服务质量方面获得权威行业机构认可。 同时,PPIO 还受邀参加高质量词元(Token)服务专题研讨,

By luigi
PPIO首发上线GLM-5.2:代码能力仅次于Claude Fable 5

PPIO首发上线GLM-5.2:代码能力仅次于Claude Fable 5

今天,PPIO 首发上线智谱最新开源旗舰模型 GLM-5.2。 其核心特点如下: ✅Coding 能力开源 SOTA:GLM-5.2 发布即获 LMArena 代码榜开源模型第一、全球模型二,整体表现仅次于 Claude Fable 5; ✅支持真正可用的 1M 上下文:一次任务即可完成“从需求到多端可部署产物“的完整开发链路; ✅自主规划驱动高效迭代:引领开发模式从 Vibe Coding 迈向 Agentic Engineering,构建“规划-实现-迭代”的工程闭环; ✅万级真实任务验证:构建逾万个可验证任务环境,覆盖九大主流编程语言,大幅提升模型软件工程能力。 现在,GLM-5.2 已上线 PPIO 模型广场,您可以在线试用该模型或通过 API 快速集成。  地址:https:

By PPIO