PPIO 上线 DeepSeek-OCR-2 ,支持一键私有化部署

PPIO 上线 DeepSeek-OCR-2 ,支持一键私有化部署

PPIO 算力市场首发上线了 DeepSeek-OCR-2 部署模板,为开发者提供开箱即用的模型服务。

DeepSeek-OCR-2 是 DeepSeek 团队最新发布的开源 OCR 模型。与传统 OCR 方案不同,该模型引入了 DeepEncoder V2 视觉编码器,并采用了“视觉因果流(Visual Causal Flow)”技术。这一架构改变使得模型能够基于语义逻辑理解文档结构,从而在处理多栏排版、复杂表格以及图文混排场景时表现出更高的准确性。

同时,DeepSeek-OCR-2 优化了视觉 Token 的压缩效率,在保持高精度的前提下显著降低了计算开销,非常适合作为多模态大模型的前端输入或用于高精度文档数字化任务。

现在,你可以通过 PPIO 算力市场的 DeepSeek-OCR-2 模板,将模型一键部署在 GPU 云服务器上。无需复杂的环境配置,只需简单几步,即可拥有私有化的 DeepSeek-OCR-2 模型,快速验证业务效果。

项目地址:https://ppio.com/gpu-instance/console/explore?selectTemplate=select

#01 GPU 实例+模板,一键部署 DeepSeek-OCR-2

step 1: 子模版市场选择对应模板,并使用此模板。

step 2: 按照所需配置点击部署。

step 3: 检查磁盘大小等信息,确认无误后点击下一步。

step 4: 稍等一会,实例创建需要一些时间。

step 5: 在实例管理里可以查看到所创建的实例。

step 6: 点击启动 Web Terminal 选项,启动后点击连接选项即可连接到网页终端。

#02 如何使用

step 1:修改/vllm-workspace/DeepSeek-OCR-2/DeepSeek-OCR2-master/DeepSeek-OCR2-vllm目录下 config.py配置

/your/image/path/ 需要处理的图片目录

/your/output/path/ 解析后输出的目录

step 2:/vllm-workspace/DeepSeek-OCR-2/DeepSeek-OCR2-master/DeepSeek-OCR2-vllm目录执行脚本

$ python3 run_dpsk_ocr2_image.py                                                                                                                    
INFO 01-27 00:38:17 [__init__.py:239] Automatically detected platform cuda.                                             
INFO 01-27 00:38:21 [config.py:456] Overriding HF config with {'architectures': ['DeepseekOCR2ForCausalLM']}            
INFO 01-27 00:38:22 [config.py:717] This model supports multiple tasks: {'embed', 'reward', 'generate', 'score', 'classify'}. Defaulting to 'generate'.                                                                                         
INFO 01-27 00:38:22 [llm_engine.py:240] Initializing a V0 LLM engine (v0.8.5) with config: model='deepseek-ai/DeepSeek-OCR-2', speculative_config=None, tokenizer='deepseek-ai/DeepSeek-OCR-2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=deepseek-ai/DeepSeek-OCR-2, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=False,                                                           
INFO 01-27 00:38:24 [cuda.py:292] Using Flash Attention backend.                                                        
INFO 01-27 00:38:24 [parallel_state.py:1004] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0      
INFO 01-27 00:38:24 [model_runner.py:1108] Starting to load model deepseek-ai/DeepSeek-OCR-2...                         
INFO 01-27 00:38:25 [config.py:3614] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256] is overridden by config [256, 128, 2, 1, 4, 136, 8, 144, 16, 152, 24, 160, 32, 168, 40, 176, 48, 184, 56, 192, 64, 200, 72, 208, 80, 216, 88, 120, 224, 96, 232, 104, 240, 112, 248]                                                                
INFO 01-27 00:38:26 [weight_utils.py:265] Using model weights format ['*.safetensors']                                  
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]                                            
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 20.02it/s]                                    
                                                                                                                        
INFO 01-27 00:38:28 [loader.py:458] Loading weights took 1.26 seconds                                                   
INFO 01-27 00:38:28 [model_runner.py:1140] Model loading took 6.3336 GiB and 3.787022 seconds                           
Some kwargs in processor config are unused and will not have any effect: image_std, candidate_resolutions, downsample_ratio, mask_prompt, image_mean, sft_format, ignore_id, normalize, image_token, patch_size, add_special_token, pad_token.  
WARNING 01-27 00:38:33 [fused_moe.py:668] Using default MoE config. Performance might be sub-optimal! Config file not found at /usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=64,N=896,device_name=NVIDIA_GeForce_RTX_4090.json                                                                                                 
INFO 01-27 00:38:33 [worker.py:287] Memory profiling takes 5.08 seconds                                                 
INFO 01-27 00:38:33 [worker.py:287] the current vLLM instance can use total_gpu_memory (23.53GiB) x gpu_memory_utilization (0.75) = 17.65GiB                                                                                                    
INFO 01-27 00:38:33 [worker.py:287] model weights take 6.33GiB; non_torch_memory takes 0.08GiB; PyTorch activation peak memory takes 1.07GiB; the rest of the memory reserved for KV Cache is 10.16GiB.                                         
INFO 01-27 00:38:34 [executor_base.py:112] # cuda blocks: 11100, # CPU blocks: 4369                                     
INFO 01-27 00:38:34 [executor_base.py:117] Maximum concurrency for 8192 tokens per request: 21.68x                      
INFO 01-27 00:38:35 [model_runner.py:1450] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory_utilization` or switching to eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.                                    
Capturing CUDA graph shapes: 100%|██████████████████████████████████████████████████████| 35/35 [00:08<00:00,  4.21it/s]
INFO 01-27 00:38:44 [model_runner.py:1592] Graph capturing finished in 8 secs, took 0.16 GiB                            
INFO 01-27 00:38:44 [llm_engine.py:437] init engine (profile, create kv cache, warmup model) took 15.32 seconds         
INFO 01-27 00:38:44 [async_llm_engine.py:211] Added request request-1769503124.                                         
Some kwargs in processor config are unused and will not have any effect: image_std, candidate_resolutions, downsample_ratio, mask_prompt, image_mean, sft_format, ignore_id, normalize, image_token, patch_size, add_special_token, pad_token.  
<|ref|>text<|/ref|><|det|>[[163, 0, 456, 155]]<|/det|>                                                                  
By the lottery jaunty, the chances of the lottery combinations to work out the chances of other prizes, but it all starts to get a bit fiddly, we'll move on to something else. (How to work out the other lottery chances is just one of the amazing features you'll find at: www.murderousmaths.co.uk)                                                                
                                                                                                                        
<|ref|>sub_title<|/ref|><|det|>[[163, 168, 280, 199]]<|/det|>                                                           
## The disappearing sum                                                                                                 
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[160, 189, 456, 402]]<|/det|>                                                                
It's Friday evening. The lovely Veronica Gummfloss has been out with the football team who have all escorted her safely back to her doorstep. It's that tender moment when each hopeful player closes his eyes and leans forward with quivering lips. Unfortunately Veronica's parents heard them clumping down the road and Veronica knows she only has time to kiss four out of the eleven of them if she's going to do it properly.                                                          
                                                                                                                        
<|ref|>image<|/ref|><|det|>[[163, 408, 450, 666]]<|/det|>                                                               
                                                                                                                        
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[154, 645, 450, 744]]<|/det|>                                                                
How many choices has she got? It's  \( {}^{11}C_{4} \)  which is  \( {}^{11} \)  but for goodness sake DON'T reach for the calculator! The most brilliant thing about perms and                                                                 
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[500, 0, 808, 115]]<|/det|>                                                                  
comes is that the answer can never be a fraction which means that EVERYTHING ON THE BOTTOM ALWAYS CANCELS OUT! It's probably the best fun you'll ever have with a pencil so here we go...                                                       
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[512, 111, 792, 188]]<|/det|>                                                                
 \( 4! \times 7! \)                                                                                                     
                                                                                                                        
 \( 11! = 11 \times 10 \times 9 \times 8 \times 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1 \)              
                                                                                                                        
 \( 4 \times 3 \times 2 \times 1 \times 7 \times 6 \times 5 \times 4 \times 4 \times 3 \times 2 \times 1 \)             
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[499, 179, 804, 295]]<|/det|>                                                                
(Before we continue, grab this book and show somebody this sum. Rub their face on it if you need to and tell them that this is the sort of thing you do for fun without a calculator these days because you're so brilliant.)                   
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[499, 291, 803, 350]]<|/det|>                                                                
Off we go then. For starters we'll get rid of the 7! bit from top and bottom and get:                                   
                                                                                                                        
<|ref|>equation<|/ref|><|det|>[[612, 361, 696, 412]]<|/det|>                                                            
 \[ 11\times10\times9\times8 \]                                                                                         
                                                                                                                        
<|ref|>equation<|/ref|><|det|>[[614, 384, 691, 414]]<|/det|>                                                            
 \[ 4\times3\times2\times1 \]                                                                                           
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[496, 424, 801, 525]]<|/det|>                                                                
Pow! That's already got rid of more than half the numbers. Next we'll see that the  \( 4 \times 2 \)  on the bottom cancels out the 8 on top (and we don't need that  \( \times 1 \) " on the bottom either). We're left with...                
                                                                                                                        
<|ref|>equation<|/ref|><|det|>[[616, 541, 686, 592]]<|/det|>                                                            
 \[ \frac{11\times10\times9}{3} \]                                                                                      
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[495, 604, 800, 655]]<|/det|>                                                                
Then the 3 on the bottom divides into the 9 on top leaving it as a 3 so all we've got now is:                           
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[555, 669, 739, 694]]<|/det|>                                                                
Veronica's choices =  \( 11 \times 10 \times 3 \)                                                                       
                                                                                                                        
<|ref|>text<|/ref|><|det|>[[493, 714, 600, 741]]<|/det|>                                                                
Look! No bottom.INFO 01-27 00:38:48 [async_llm_engine.py:179] Finished request request-1769503124.                      
                                                                                                                        
                                                                                                                        
INFO 01-27 00:38:48 [async_llm_engine.py:65] Engine is gracefully shutting down.                                        
===============save results:===============                                                                             
image: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 26886.56it/s]
other: 100%|████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 223101.28it/s]
[rank0]:[W127 00:38:49.576360410 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) 

PPIO 的算力市场模板致力于帮助企业及个人开发者降低大模型私有化部署的门槛,无需繁琐的环境配置,即可实现高效、安全的模型落地。目前,PPIO算力市场已上线几十个私有化部署模板,除了 DeepSeek-OCR-2,你也可以将 AutoGLM-Phone-9B、 GLM-Image、PaddleOCR-VL 等模型快速进行私有化部署。

Read more

PPIO亮相华为云创想者大会:以Agentic Cloud加速中国AI走向全球

PPIO亮相华为云创想者大会:以Agentic Cloud加速中国AI走向全球

近期,PPIO 受邀参加 2026 华为云 INSPIRE 创想者大会。 PPIO MaaS 平台产品专家谢晋发表主题演讲《PPIO MaaS平台如何加速中国 AI 走向全球》,从全球视角系统阐述了中国 AI+Token 出海的两大核心维度、热门出海场景以及 PPIO 的 Agentic Cloud 基础设施战略布局。 PPIO 以两大核心引擎为 AI 原生应用与智能体开发提供基础设施服务:全模态 MaaS 平台,提供一站式算力与模型服务;Agent 沙箱,提供智能体安全隔离的云端运行环境。 截至 2026 年 4 月,PPIO 整合了全球 4800+ 分布式算力节点,日均 Token 调用量超过 10000 亿,较

By luigi
PPIO上线MiniMax M3:限时7天5折优惠

PPIO上线MiniMax M3:限时7天5折优惠

作为 MiniMax 官方 MaaS 合作伙伴,PPIO 已首发上线 MiniMax M3 模型。 MiniMax M3 在编程和智能体等专业任务上达到了前沿的能力,最高支持 1M 超长上下文,也是一个原生多模态模型,支持图片和视频的输入,并能操作电脑桌面。 在衡量 Coding 能力的 SWE-Bench Pro 上,MiniMax M3 超过 GPT-5.5 和 Gemini 3.1 Pro,接近 Opus 4.7。 在综合评估 SVG 生成性能的基准 SVG-Bench 上,MiniMax M3 超过 Opus 4.7。

By PPIO
PPIO入选量子位2026年度值得关注的AIGC企业&产品两项大奖

PPIO入选量子位2026年度值得关注的AIGC企业&产品两项大奖

近日,PPIO 入选量子位发布的“2026 年度值得关注的 AIGC 企业”与“2026 年度值得关注的 AIGC 产品”两大榜单,与智谱、MiniMax、阿里、腾讯等企业同台获奖。 该榜单结合过去一年生成式 AI 企业与产品的实际表现、用户反馈与行业影响力,并参考数十位行业专家、投资人、技术负责人及产业从业者意见,评选出最终结果。 一举入选两个榜单,不仅是对 PPIO 为开发者与企业客户提供高质量的 AI 全栈云产品与服务的认可,更是对 PPIO 在智能体基础设施领域技术与产品创新的肯定。 PPIO入选「2026 年度值得关注的 AIGC 企业」 2026 年度值得关注的 AIGC 企业覆盖了从底层算力、基础模型到行业应用的全产业链环节。 入选企业或在技术纵深上持续突破,或在商业化落地上跑通了闭环,或在垂直场景里扎下了深根,凭借扎实的技术创新能力与可验证的产业实践,共同推动 AIGC

By PPIO
PPIO参展2026上海信息消费节,全栈式AI云产品亮相

PPIO参展2026上海信息消费节,全栈式AI云产品亮相

近日,由上海市经济和信息化委员会主办的 2026 上海信息消费节开幕。作为“Token 经济”的全栈式 AI 原生云代表企业,PPIO 受邀参展并做主题分享。 PPIO 产品专家胡昕媛现场分享了主题为《云端 Agent 为何需要一个 Sandbox》的演讲,拆解了 Agent 落地生产环境的核心挑战,并分享了 PPIO 在 Agent Infra 领域的最新实践。 胡昕媛指出,随着 AI Agent 从“会回答”走向“会执行”,执行环境正在成为落地的新卡点。传统容器和虚拟机方案,要么隔离不够强,要么启动太慢、成本太高,都难以真正满足 Agent 生产环境的需求。 PPIO 沙箱基于 Firecracker microVM 的高性能技术架构,通过

By PPIO