In This Article
Practices 1-3: Process Selection and Analysis
Practice 1: Automate the Right Processes (Not the Loudest Request)
The VP who shouts loudest gets the first bot. The process has 47 exceptions, requires judgment for 30% of cases, and depends on 3 legacy systems with no API. The bot takes 3 months to build instead of 3 weeks. When it deploys, it handles 52% of cases — the other 48% still require manual processing. The ROI is marginal. Meanwhile, the accounts payable process — 95% structured, high volume, rules-based — sits in the backlog because nobody championed it loudly enough.
Process selection criteria: volume (high transaction count — automation value scales with volume), rules-based (follows explicit if-then logic — no judgment, no interpretation), structured data (typed fields, standard formats — not handwritten notes or free-text emails), stable process (doesn't change monthly — automating a frequently changing process creates continuous rework), and stable systems (the source and target applications don't update UI frequently). Score each candidate. Automate the highest-scoring processes first — regardless of who requested them.
Practice 2: Document Before Automating (The PDD Is Not Optional)
The Process Definition Document (PDD) specifies exactly what the bot does — every step, every decision, every exception, every system interaction. Teams that skip the PDD "to save time" spend 3x longer in development because requirements emerge mid-build. The PDD prevents: scope creep (the process owner adds 5 more steps during development), incorrect logic (the developer interprets a rule differently from the business), and untested exceptions (the 15% edge case nobody mentioned until production).
The PDD includes: step-by-step process flow with screenshots, decision rules at each branch, exception types and handling procedures, input/output specifications for each step, systems and credentials required, SLA requirements (processing time, accuracy), and the manual fallback process for when the bot fails.
Practice 3: Measure the Manual Process First
You can't prove automation ROI without a baseline. Before building the bot, measure: average manual processing time per item, volume per day/week/month, error rate in manual processing, cost per item (time × loaded rate), and bottleneck points (where work queues up). This baseline becomes the comparison for post-automation measurement. Without it, the ROI conversation becomes "we think it's faster" — which doesn't survive CFO scrutiny.
Practices 4-6: Bot Design and Development
Practice 4: Build for the Next Developer, Not for Yourself
The developer who builds the bot won't maintain it forever. Design for hand-off: naming conventions that explain what each activity does (Click_SubmitInvoice, not Activity_23), comments on business logic explaining why (not what — the code shows what), modular design (each functional block is a reusable component), and README documentation covering: what the bot does, systems accessed, credentials needed, configuration parameters, and known failure modes.
Practice 5: Externalize Everything That Changes
Configuration data lives outside the bot code: file paths, URLs, XPath selectors, column mappings, thresholds, email recipients, and credentials. When the accounts payable folder moves from \serverp5 to \serverp6, the admin updates a config file — no bot code change, no testing cycle, no deployment. This practice alone prevents 40% of "bot broke" incidents that consume maintenance capacity.
Practice 6: Design for Failure (Because Bots Will Fail)
Every system interaction can fail: the website is slow, the login times out, the expected button doesn't appear, the file isn't where expected, the database returns an error. Design every interaction with: timeout handling (don't wait indefinitely), retry logic (transient failures resolve on retry), screenshot capture on failure (for debugging), structured error logging (which step, which data, what error), and escalation (notify humans when retries exhaust). A bot without error handling is a ticking bomb — it will fail, and nobody will know why or when.
Production bots should achieve 95%+ success rate. Below 95%, the exception volume overwhelms human handlers and the automation creates more work than it eliminates. Design, testing, and error handling drive reliability. A bot rushed to production with 80% success rate and "we'll fix it later" never gets fixed — the maintenance team is too busy handling the 20% failures.
Practices 7-9: Testing and Quality Assurance
Practice 7: Test with Production-Volume Data (Not 5 Records)
A bot tested with 5 invoices works perfectly. In production, it processes 500 invoices and crashes at invoice #347 because the vendor name contains a special character the bot doesn't handle. Test with production-representative data — volume, variety, and edge cases. Include: the largest possible input, the smallest possible input, empty/null values, special characters, Unicode text, maximum-length strings, and the data patterns that cause manual processing errors (those same patterns will challenge the bot).
Practice 8: Test the Failure Paths (Not Just the Happy Path)
Happy-path testing (everything works as expected) validates 70% of the bot's code. The other 30% — error handling, retry logic, exception routing, fallback processes — only executes when things go wrong. Test every failure mode: network timeout, login failure, unexpected UI element, missing file, locked record, concurrent access conflict, and credential expiration. A bot that handles happy paths and crashes on exceptions isn't production-ready.
Practice 9: Regression Test After Every Change
A "small fix" to handle a new invoice format breaks the processing logic for the original format. Regression testing — running the full test suite after every change — catches these cascading failures. Automate the test suite so regression testing doesn't depend on someone remembering to run it. In mature RPA programs, regression tests run automatically before every production deployment.
Practices 10-12: Production Operations
Practice 10: Monitor Bot Health, Not Just Execution
Execution monitoring: "the bot ran." Health monitoring: "the bot ran, processed 487 items in 2.3 hours, with 3 exceptions routed to the queue, and average item processing time of 17 seconds — consistent with the 30-day baseline." Health monitoring detects degradation before failure: processing time increasing (system performance issue), exception rate rising (data quality issue or system change), or volume declining (upstream process change affecting input).
Practice 11: Maintain a Runbook for Every Bot
The runbook answers: what do I do when this bot fails at 2 AM on a Saturday? Runbook contents: what the bot does (one paragraph), systems and credentials accessed, schedule and expected duration, common failure modes and fixes (with screenshots), escalation contacts, manual fallback procedure, and the recovery process (how to reprocess failed items after the fix). Without runbooks, every bot failure requires the original developer — and they're not always available at 2 AM.
Practice 12: Schedule Maintenance Windows (Don't Wait for Breaks)
Proactive maintenance: monthly review of bot performance trends, quarterly review of system update schedules (does any target system have a UI update planned?), and annual review of process changes with business owners (has the manual process changed since the bot was built?). Preventive maintenance costs 10% of reactive maintenance — fixing a broken bot during a production incident costs 5-10x more than updating it proactively before the break.
Practices 13-15: Scaling and Continuous Improvement
Practice 13: Build a Reusable Component Library
Every bot that logs into SAP, reads from a SharePoint list, sends a Teams notification, or writes to SQL uses the same patterns. Build these as reusable components — tested, documented, version-controlled. The SAP login component handles: credential retrieval from vault, connection pooling, session management, and timeout handling. Every bot that needs SAP access imports the component instead of rebuilding the logic. Component reuse reduces development time by 30-40% and improves reliability (the component has been tested across 20 bots, not just this one).
Practice 14: Measure and Report ROI Quarterly
RPA programs that don't measure ROI get their budget questioned at every review. Measure per bot: items processed, time saved (versus manual baseline), errors prevented, and dollar value (time × rate). Aggregate: total automation hours, total FTE-equivalents, total dollar value, cost to operate (licensing + team + infrastructure). Present quarterly to leadership — not as a defense, but as a growing business case for expansion. Automation ROI that compounds quarter over quarter builds the organizational commitment that sustains the program.
Practice 15: Retire Bots That No Longer Add Value
Bots accumulate. The process changes and the bot handles 20% of cases instead of 90%. The system gets an API that eliminates the need for UI automation. The business unit restructures and the process no longer exists. Quarterly review each bot: is it still needed? Is it still effective? Is the ROI still positive after maintenance cost? Retiring a bot that costs $4,000/year to maintain and saves $3,000/year is a $1,000/year improvement — and it frees maintenance capacity for bots that still deliver value.
Five Anti-Patterns That Kill RPA Programs
The "Automate Everything" Mandate
Leadership mandates 100 automations by year-end. Teams automate trivial processes (rename files, move emails) to hit the number. The 100 bots produce 10% of the ROI that 20 well-selected bots would produce. Measure value, not count.
The One-Person Program
One developer builds, deploys, maintains, and supports all bots. They become a single point of failure. When they take vacation, nobody can fix broken bots. When they leave, institutional knowledge walks out the door. Build a team — even if it starts with 2 people.
Automating a Broken Process
The manual process has 12 unnecessary steps, 3 redundant approvals, and 2 re-keying operations. Automating it produces a bot that executes 12 unnecessary steps, 3 redundant approvals, and 2 re-keying operations — faster. Fix the process first. Automate the optimized process. Automating waste produces faster waste.
No Testing Beyond "It Ran Once"
The bot ran successfully on 5 test items. It's deployed to production. On day 3, it encounters a vendor name with an apostrophe and crashes. Thorough testing with production-representative data prevents this — and it takes 2-3 days, not 2-3 weeks. Skipping testing to "save time" costs 10x more in production incidents.
Set and Forget
The bot is deployed and nobody monitors it. Three months later, somebody notices invoices haven't been processed since the ERP update 6 weeks ago. The bot failed silently — no alerts, no monitoring, no one checked. Every bot needs monitoring from day one. "Set and forget" is "deploy and pray."
The Xylity Approach
We implement RPA with these 15 practices built into every engagement — process analysis before automation, design patterns that scale, testing that covers failure paths, operations with monitoring and runbooks, and the governance framework that prevents bot sprawl. Our automation specialists build bots alongside your team, transferring the practices and discipline that sustain an RPA program for years, not months.
Go Deeper
Continue building your understanding with these related resources from our consulting practice.
RPA That Runs for Years, Not Months
15 practices — process selection, bot design, testing, operations, scaling. The discipline that makes RPA programs sustainable at enterprise scale.
Start Your RPA Implementation →