2025 · 02 · 149 min readcareer · essay

From RPA to data: notes from the in-between.

What two years of automating banks and e-commerce processes taught me before I jumped to a data role — and which of those lessons turned out to be portable, which were a trap, and what I'd tell a junior developer eyeing the same move today.

The first time I told someone in a data interview that I had spent two years writing RPA bots, they did the thing — the small, polite pause where you can see them re-classifying you in their head. Oh. So you're not really an engineer. They never said it. They didn't have to.

That moment, repeated across a dozen interviews in 2024, is the reason for this post. Because the honest answer is more interesting than either party usually has time for. RPA gave me half of what a junior data engineer needs and almost none of the prestige. Some of what it taught me has aged beautifully. Some of it I have had to actively un-learn. This is a long, opinionated map of which is which.

What RPA actually was, for me

I want to say what I mean by "RPA" first, because the term has been so abused by vendor marketing that it has lost a lot of its shape. In practice my RPA years (across Papara, Latro, QNB Finansbank, DenizBank) looked like this:

Reading internal SOPs written by ops people, sometimes in Word, sometimes only in someone's head.
Translating those SOPs into UiPath / Blue Prism flows that opened SAP, Excel, web portals, mainframe green screens and PDFs in some specific order.
Negotiating with three different teams about who owned the credentials, the VM, the queue, and the on-call when it broke at 03:00.
Writing a lot of defensive try / catch / retry / log code around UI calls that were one Chrome update away from snapping.

That last bullet is the secret. Once you strip the demos, RPA is mostly integration without an API. Everything else — the orchestration, the monitoring, the SLAs, the "this thing has to run at 06:00 or finance can't close" — is just normal production engineering done in a slightly cursed runtime.

What was portable

1. Production empathy

When your bot calls a SAP transaction that times out at 03:14, and finance is escalating at 09:00 because cash positions haven't loaded, you learn very quickly that "it works on my machine" is not an engineering position. It is an apology.

I came into data engineering already believing the things that everyone tells junior data engineers and that junior data engineers mostly nod at without internalising: that idempotency is non-negotiable, that logs without context are decorations, that "I'll just re-run it" should be a feature of the system, not a heroic act by a human. Those were not abstractions to me. They were scar tissue. I have written a whole post about one slice of that.

2. Reading other people's processes for a living

The single most under-rated RPA skill is sitting with a domain expert — usually someone in finance, ops, or supply chain — and patiently extracting what they actually do, as opposed to what the SOP says they do. The SOP says "verify the invoice against the PO". The actual job is "if the supplier is X, ignore the third line because they always overcharge by 2 TL and we settle quarterly".

That skill — translating tacit, messy, exception-riddled business logic into something a machine can execute — turns out to be exactly what a data engineer does when they sit down with a finance lead and try to figure out what "active customer" means in this particular company. It is the same job. The output format just changed from a UiPath flowchart to a CTE.

3. Versioning the un-versionable

RPA platforms are not, charitably, kind to source control. Half your "code" lives in proprietary XML, half in some studio's binary blob, and the diff view is "good luck". You very quickly invent your own rituals: naming conventions for processes, a parallel Git repo for the human-readable parts, a release checklist, an environments matrix.

It was annoying. It also turned out to be a great pre-school for the more grown-up versions of the same problem in data: managing SQL across environments, keeping dbt models tidy, separating dev / prod credentials, making changes reviewable. The tooling got better. The discipline was already there.

What was a trap

1. Pattern matching on UI, not data

RPA trains a very specific reflex: when something breaks, look at the screen. Is the button in a different place? Did the popup change? Did the page load in time?

When something breaks in a data pipeline, the screen is almost never the right place to look. The right place is several layers up: the source contract, the freshness SLA, the schema drift in the upstream Kafka topic, the silent column reorder in a CSV from a partner. I had to consciously rewire myself away from "stare at the failure" and toward "ask what changed in the inputs". It took longer than I'd like to admit.

2. Optimising for "it runs once a day, that's enough"

Many RPA jobs run nightly. Many of them treat performance as decoration — if it finishes by morning, ship it. Data engineering, even at modest scale, punishes this hard. A query that scans 800 GB and runs in 11 minutes is not "fine because it ran" — it is a cost someone is paying, every day, that you are responsible for.

I had to learn — not just intellectually, but in my hands — to read EXPLAIN plans, to think about distribution keys and sort keys in Redshift, to ask "how much data does this actually need to touch?" before writing the query, not after the bill came in. RPA never made me ask that. It rewarded me for finishing.

3. Believing the demo

RPA vendors run on demos. Beautiful, frictionless demos where the bot reads the PDF, opens the portal, fills the form, and a Slack message says "Done in 14 seconds!"

The reality of any one of those steps is a 200-line state machine with retries, credential refreshes, fallback selectors, dead-letter queues and a human on standby. I think every RPA developer leaves with a healthy allergy to demoware — which is wonderful when a vendor walks into your data team in 2025 with a slide that says "AI agent automates your entire ELT in 14 seconds!"

The pattern: the parts of RPA that traveled well were about operating software in production — humility, monitoring, exception handling, talking to humans. The parts that didn't travel were about building software at the scale and economics that analytics actually demands.

What I'd tell a junior RPA developer today

If you're two years into UiPath or Blue Prism or Power Automate and looking longingly at data engineering job descriptions, here is the short version of what I would have liked someone to tell me in 2023.

Stop calling yourself an "RPA developer" in your CV. Call yourself an automation engineer who happens to work in UiPath. The job descriptions are the same in 80% of cases. Recruiters parse the title, not the platform.
Learn SQL deeply, not broadly. Window functions, CTEs, query plans, set theory. Skip the part of every tutorial that teaches you SELECT * FROM users for the tenth time. Go straight to "why is this query slow and what would I change in the schema?".
Pick one cloud warehouse and one orchestrator and build something end-to-end on your laptop. BigQuery + dbt + a free GitHub Actions schedule is plenty. Snowflake + Airflow on a free trial is fine. The specific tools matter less than the fact that you've shipped one pipeline from raw to dashboard that survives a re-run.
Translate every RPA story on your CV into a data story. "Automated invoice reconciliation in UiPath" becomes "designed a daily reconciliation job that processed ~14k invoices, with idempotent retries, structured logging and a per-supplier exception queue." Same project. Different audience.
Be patient with the prestige gap. It is real. It is also smaller every year, because every data team eventually discovers that the boring, ops-shaped half of the job is the half that breaks at 03:00 — and that is exactly the half you already know.

The thing nobody tells you

The thing nobody tells you about moving from RPA to data is that the hardest part is not technical. It is identity. For two years you were the person who made the broken thing work, the last line of defence against an Excel that wouldn't behave. You were useful in a very immediate, very visible way. Finance loved you, because you saved them four hours a day.

Data engineering is, frankly, less immediate. You build a pipeline. Six months later, a number on a dashboard is correct. Nobody throws a party for the number being correct. The work is more leveraged and quieter. That trade — visibility for leverage — is the actual career move. Everything else is just learning some new SQL.

RPA taught me to make things work in production before they were elegant. Data taught me that elegance, in the small, is what keeps them working in production.
— a thing I keep telling junior friends

If you are mid-jump and any of this resonates — or doesn't — I'm always happy to read messages from people in the same spot. The in-between is shorter than it feels.

careerrpauipathdata engineeringessay

Next post →What U-Net taught me about reading MRI slices ← PreviousWhy I'd rather build the same Redshift query twice