麥思知識學院 MINDS Knowledge Academy
Industry Insights7 min read

Why is your AI quoting assistant becoming less accurate over time? The key is feedback

Many printing houses deploy AI customer service and automated quoting systems, only to find six months later that they haven't become smarter—they’ve just become more efficient at making the same mistakes. A paper on Effective Feedback Compute exposes the reason, and provides a path for printing shops to train their AI to become increasingly accurate

麥思知識學院 | Simon H.

Why is your AI quoting assistant becoming less accurate over time? The key is feedback

Why does AI tool performance stagnate after six months of deployment?

Over the past month or two, I’ve visited several small and medium-sized printing houses where the owners asked me the same thing: 'We introduced an AI quoting assistant and an automated LINE customer service bot last year. It was amazing during the trial, but why does it feel like it hasn't improved since, and sometimes makes even more ridiculous mistakes?'

This phenomenon is thoroughly explained in a recent paper titled 《Scaling Laws for Agent Harnesses via Effective Feedback Compute》, authored by Xuanliang Zhang et al. I read the Chinese summary by Wisely Chen

It directly quantifies a counterintuitive fact: you might assume that 'more computing power, more tools, and more iterations' will make an AI stronger, but that's actually not the case

The paper uses raw tokens and tool calls to explain task success rates, and the R² correlation coefficients were only:

・0.33 to

・0.42

In plain language for the printing floor: if you make your AI customer service chat logs as detailed as possible, increase the number of quote recalculations from one to three, and connect two more databases to it, these 'I've done a lot' actions can only explain about 30-40% of the results. The remaining 60% has nothing to do with how many resources you burn

I compare this to training an apprentice. A master might have an apprentice print 200 practice sheets a day, but if the master never points out faults or explains why the color registration is off, that apprentice will still be at the same level after printing 10,000 sheets. They haven't gotten better; they've just gotten more tired

為什麼 AI 工具接上去半年,效果反而停滯?|你的 AI 報價助手為什麼越用越歪?關鍵在反饋 段落重點

What exactly is EFC? And what does it have to do with 'mentoring'?

The core concept of the paper is called Effective Feedback Compute, or EFC for short. It means that not all interactions count; only 'effective feedback' can truly make an AI improve

It defines effective feedback as needing to simultaneously satisfy four conditions. I'll map these one-by-one to a printing scenario:

・Informative: The feedback provides new information. If a client says a quote is too expensive but doesn't specify if it's the paper or the post-processing that's costly, that feedback is useless

・Valid: The feedback must be reliable, not noise or guesswork. If a salesperson casually notes 'this client doesn't care about price' when that’s completely wrong, feeding that incorrect feedback in is worse than not feeding in any at all

・Non-redundant: Don't repeat what you already know. If the system has already recorded 100 times that 'the client wants 100lb coated paper,' recording it again provides no new information

・Retained: This is the toughest one. Was the feedback actually used in the next decision? If the sales team discusses the right judgment in a group chat but no one incorporates it into the quoting logic, it's as if they never said anything at all

The most critical figure is this: the paper conducted a controlled experiment. With absolutely no change in the computing power budget, they only focused on improving the quality of feedback, and the task success rate jumped from 27% to 90%

Not a penny more was spent, yet by simply making the feedback 'effective,' the success rate more than tripled. After recalculating, the explanatory power R² rose from:

・0.33 sharply to

・0.94 to

・0.99

This theory is actually just the 'deliberate practice' that learning science has been talking about for decades: feedback must be specific, correct, and incorporated into the next practice session. If you practice without reviewing, or review without changing, it’s equivalent to not practicing at all. AI works the same way humans do

EFC 到底是什麼?跟「帶師傅」有什麼關係?|你的 AI 報價助手為什麼越用越歪?關鍵在反饋 段落重點

How can printing houses design a feedback loop for AI quoting, follow-ups, and customer service?

Knowing the principles, the question becomes: how do you actually implement this feedback loop in a printing workflow? Here are a few things you can start doing this week

First, build a 'standard answer' reference table. Identify the 20 or 30 items most frequently quoted over the past six months—saddle-stitched catalogs, perfect-bound books, stickers, cartons—and organize the correct material codes, paper types, post-processing, and reasonable price ranges into a 'ground truth' document. Unless the AI's quotes match this, you have no 'signal' to correct it with; otherwise, you won't even know if it's quoting incorrectly

Second, keep a record every time the AI makes a mistake, and record the root cause. Don't just record 'quoted incorrectly'; record 'it calculated 250lb cardstock as 200lb' or 'forgot to include the cost of UV coating.' This corresponds to the 'Informative' requirement—it must be specific enough to be actionable

Third, periodically feed failed cases back into the system. Spend one hour a month taking the cases where the AI gave wrong quotes or wrong customer service answers and use them to refine its prompts or rules. This step is what 'Retained' means; whether the feedback loop 'closes' depends on this. Passing chat logs don't count—only organized information and improved rules count

Fourth, before adding any new feature, run it through the fourth EFC condition. If you want to add another tool or an automated reply, ask yourself: will this really change the AI's judgment next time? If not, adding it is pure money-burning and purely increases maintenance burdens

The same applies to the design side. If you use AI to assist with image generation, revisions, or proposals, each piece of client feedback is your feedback signal. Specifically record *why* the client rejected this version and avoid that in the next proposal; your hit rate will improve. Just letting the rejected files sit there without analyzing the reasons means you'll be in the same spot even after 100 revisions

印刷廠的 AI 報價、追單、客服,反饋閉環怎麼設計?|你的 AI 報價助手為什麼越用越歪?關鍵在反饋 段落重點

Before introducing AI memory features, you must first install a gateway

Some vendors push memory features, claiming 'AI will remember your company's habits,' which sounds great. But the paper offers a warning here that I strongly agree with

Memory architecture solves the fourth condition, which is the hardest—'retain.' But it *only* solves the ability to remember; it doesn't help you filter whether the first three conditions are correct or whether the information is redundant

In other words, if you blindly save incorrect, repetitive, or noisy feedback, these false memories will be repeatedly called upon, and the toxicity will be greater than having no memory at all. It essentially magnifies 'becoming less accurate' from a one-off error into a permanent state

Therefore, when introducing any memory function, you must equip it with a 'write gateway': Is this information useful enough, credible enough, and non-redundant? Only save it after it passes. For a printing house, this means don't let client preferences casually noted by a salesperson without verification automatically become the system's 'facts.'

Also, to be honest, this paper isn't a magic bullet. That:

・0.94 to

・0.99 limit uses ideal information where the answer was known after the fact (the paper calls this Oracle-EFC); real-world systems can't achieve that, so that’s a theoretical ceiling, not a number you'll get tomorrow. And the 'whether feedback truly changed the decision' criterion is inherently difficult to judge. But even with these caveats, I fully buy into the core direction

Future competition for AI tools won't be about who has more features or longer chat boxes, but who can make every piece of feedback truly count. A good AI assistant isn't about making it do more work; it’s about acting like a good apprentice, making sure it learns something with every step it takes

想導入 AI 記憶功能,要先裝一道閘門|你的 AI 報價助手為什麼越用越歪?關鍵在反饋 段落重點

Key Takeaways

・Giving AI more computing power and tools only explains 30-40% of the results (R²:

・0.33

・0

・42), the remaining 60% depends on feedback quality

・With constant computing power, simply making feedback effective can jump the success rate from 27% to 90%; the difference lies in 'practicing correctly,' not 'practicing more.'

・Effective feedback must simultaneously be: Informative, Valid, Non-redundant, and Retained; missing the fourth one is equivalent to practicing in vain

・AI memory features only solve the ability to 'remember' and won't help you filter out errors; without a write gateway, false memories are more toxic than having no memory

・Feeding failed cases from AI quoting and revisions back into the system once a month is the key action to make it run more accurately over time

Further Thoughts

For printing houses and design studios, the real inspiration isn't 'should we adopt AI,' but 'is there a review mechanism designed after adoption?' Most people get stuck at the first step and stop, treating the tool's deployment as the finish line. I suggest starting with one small thing: choose a high-frequency scenario, such as catalog quoting or sticker proofing inquiries, first build a 30-item standard answer table, then schedule a one-hour monthly re-injection session specifically to fix rules using cases where the AI answered incorrectly. Once this feedback loop is running smoothly, consider adding memory features or expanding the scope. For vendors providing integrated services, this is also a great opening to lock in long-term engagement with clients: if you help the client design a good feedback loop, the system will become more and more tailored to their needs as they use it, rather than being discarded as inaccurate after six months

Further Reading

FAQ

Why does the AI quoting system become less accurate over time?
It's usually not a problem with the model's capability, but a lack of a feedback loop. If the AI doesn't get a clear signal of 'correct vs. incorrect' after each quote, and no one periodically uses failed cases to correct its rules, it will repeatedly—and potentially magnify—the same erroneous judgments
What is Effective Feedback Compute (EFC)?
EFC is a concept for measuring AI feedback quality. It posits that feedback is only effective if it simultaneously meets four conditions: it must be Informative, Valid, Non-redundant, and actually Retained (used). The paper proves that with unchanged computing power, simply improving feedback quality can increase the task success rate from 27% to 90%
What is the first step for small to medium printing houses to make AI tools more accurate?
Start by building a 'standard answer' reference table that organizes the correct material codes, paper types, post-processing, and reasonable price ranges for the 20 to 30 most frequently quoted items. Once you have this 'ground truth,' you can detect when the AI's quotes are off and correct them, which is the starting point for establishing a feedback loop
Is AI 'memory' worth implementing?
Yes, but it must be paired with a 'write gateway.' Memory functions only solve the ability to 'remember' and won't help you filter out incorrect or repetitive information. If you save noise and erroneous judgments, these false memories will be repeatedly used, which is worse than having no memory at all
How can designers using AI for revisions make it understand client preferences better?
Specifically record and summarize the reasons for each client rejection, and directly avoid those in the next proposal; only then will your hit rate increase. Just letting rejected files sit there without analyzing the reasons means you'll be spinning your wheels no matter how many versions you revise—this is the difference between having a closed feedback loop and not
LINE Chat