Dirty data in a CRM means "ivanov ivan ivanovich" crammed into a single name field, phone numbers written ten different ways, typos in emails and comments. A database like that breaks document generation, email personalization and duplicate detection. This guide covers four robots that clean data right inside a business process: splitting a full name into parts and detecting gender, validating email before a mailout, finding the carrier and region by phone number, and fixing typos in text.
Where does dirty data in a CRM come from, and why does it hurt?
There are three sources. Forms and chats: the customer types their name in lowercase and in any order. Imports: an old database brings its own formats along. Manual entry: a sales rep is in a hurry — "ivanov i.i.", an 8 instead of "+7", typos. From there the mess spreads through the system: a template-generated contract ends up addressing "dear ivanov ivan", a mailout goes to addresses that don't exist and damages the sender domain's reputation, and duplicate detection can't tell that "+7 912…" and "8 912…" are the same person (more on this in the article on duplicates in Bitrix24). The best time to clean data is the moment it appears — with robots in the lead- or contact-creation process, before the mess spreads any further.
How do I split a full name into last, first and middle name?
The field holds "petrov pyotr petrovich" as a single string, words in any order, casing random. For documents and salutations you need the parts separately, but there's no built-in way to slice the field apart. The Parse full name robot takes a string with a full name and, through the DaData standardization service, breaks it into its parts. The output: last, first and middle name as separate values, gender (M/F/N/A), the normalized full name with corrected casing, and a "Recognized" flag (Y/N). The parts go into separate contact fields, gender is used for the correct salutation in email and document templates, and an N flag triggers a task to check the card by hand — the string didn't look like a name.
How do I validate email before a mailout?
A typo in the address means an undelivered message; disposable and role-based addresses mean spam complaints and skewed mailout stats. Business processes have no built-in email validation. The Validate email robot normalizes the address through a standardization service and returns: the corrected email, the address type — personal, corporate, role-based or disposable — and "Valid" and "Recognized" flags (Y/N). The recipe: a contact-creation process runs the address through the robot; when "Valid = N" the card is flagged and kept out of the mailout segment; a disposable address is a reason to doubt the lead; a role-based one (info@, sales@) makes a personal email there pointless. The normalized address is written back into the field.
How do I find the carrier and region by phone?
The number is in the card but the region and time zone aren't — so a rep calls Vladivostok at three in the morning local time. The Phone: carrier and region robot takes a number in any notation and returns: the normalized phone, line type, carrier, region, time zone, and "Valid" and "Recognized" flags (Y/N). The lookup, like the name parsing, runs through the DaData standardization service. The results are written into the card's fields and work inside process conditions: routing a lead to a regional manager, picking a call window by time zone, weeding out invalid numbers before handing a list to the dialer.
How do I fix typos in text?
"Good morning" easily turns into "good mornign", and a company name turns into gibberish typed in the wrong keyboard layout. In comments that's tolerable; in fields that feed documents and emails, it isn't. The Fix typos robot runs text through a spell-checking service: it corrects common typos in Russian and English text and recognizes words typed in the wrong keyboard layout. The output: the corrected text and a flag for whether a correction was made. The recipe: before generating a document, the field is run through the robot; if a correction was made, the updated text is written back, and the flag lets you mark the card for a spot check — machine corrections occasionally need a human eye.
Checklist
Clean data on the way in: in the lead-creation process — parse the full name, validate the email and phone; before a mailout — weed out invalid, disposable and role-based addresses; before generating documents — fix typos and casing. All four robots are in the Roboteka catalog, install for free from the Bitrix24.Market and work in the workflow designer alongside the built-in actions. Missing a check you need? Describe the task and we'll build the robot for free and add it to the shared library.