Many tools can retrieve names from anonymous emails. But to get something clean, avoid errors and keep a custom default value for generic emails, better to do it yourself.
First, description of the problem.
Tag name
Here we concatenate salutation, first and last names from a table called users
, in a tag called name
. This in order to personalize newsletters, example:
Salutation | First name | Last name | HTML | View in mailbox and browser |
Mr | Edward | Osborne Wilson |
<span>Dear [name]</span> |
Dear Mr Edward Osborne Wilson |
OK good. But sometimes we have not all salutation, first and last names. And may we get a strange value even with all 3 fields concatenated:
Salutation | First name | Last name | HTML | View in mailbox and browser |
Mr |
<span>Dear [name]</span> |
Dear Mr | ||
<span>Dear [name]</span> |
Dear Friend | |||
Wilson |
<span>Dear [name]</span> |
Dear Wilson | ||
Mr | Ed |
<span>Dear [name]</span> |
Dear Mr Ed | |
M | E | O W |
<span>Dear [name]</span> |
Dear M E O W |
The saddest is sometimes, email reveals the perfect identity of the contact!
Even better than the data entered ... example:
Salutation | First name | Last name | HTML | View in mailbox and browser | |
Mr | Test | Xyz | Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser. |
<span>Dear [name]</span> |
Dear Mr Test Xyz |
Indeed recovering his name from email, we could extract its complete identity, to finally get a beautiful:
Dear Prof Edward Osborne Wilson ...
But still it is necessary to detect the data entry errors and choose between the input data, the data entered standardized, a custom default value or an extraction from the email.
Examples to avoid with automatic identity recovering from email:
View in mailbox and browser after recovering identity from email | |
Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser. | Dear Profedwardosbornewilson |
Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser. | Dear Info |
Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser. | Dear Contact Abstract |
Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser. | Dear Biodiversity |
How to manage this avoiding usual erros?
SQL Script
In the standard SQL example below, we will fix wrong identities from a name
field, in a user
table. This in a temporary field new_name
. We will only transfer the results after check.
We will do it step-by-step, to fully understand what happens and to be able to adapt the script to changing needs. The following pieces of code can then be executed in one go.
1) Temporary field to store and check results
ALTER TABLE users ADD new_name VARCHAR(255) NOT NULL ;
2) Update query
The next update
query will be divided to comment script.
We start to fill our new_name
field (line 2) with email prefixs (before @).
UPDATE users SET new_name = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( SUBSTRING(email, 1, (SELECT LOCATE('@', email))) ,'@',''),'.',' '),'_',' '),'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','')
The substring
(line 3) recovers only what is to the left side of the @ from email
field (using its position, itself recovered by the locate
line 3).
The replaces
(starting line 2) delete numbers and @, and update dots and underscores chars with a space (line 4).
3) Target only empty and insufficiently filled names
We will not update all records of course, to not touch those right filled. Here we update new_name
only when original name
field is not satisfying.
A where
condition limits the query to wrong filled names (lines 2, 3 and 4 below). Do not hesitate to customize this code part.
WHERE ( name LIKE 'Mr' OR name LIKE 'Mrs' OR name LIKE 'Ms' OR name LIKE 'Mr.' OR name LIKE 'Mrs.' OR name LIKE 'Ms.' OR name LIKE 'Doctor' OR name LIKE 'Dr' OR name LIKE 'Dr.' OR name LIKE 'Dra' OR name LIKE 'Mister' OR name LIKE 'Miss' OR name LIKE 'Monsieur' OR name LIKE 'Madame' OR name LIKE 'Mademoiselle' OR name LIKE 'Contact' OR name LIKE '' OR name LIKE NULL OR name IS NULL )
The line 5 is here the most important, it catchs all empty names.
But the where
condition doesn't finish here.
4) Target only compound email prefixs
After getting out of the first parenthesis, we add this second to select only compound email prefixs (before the @), with assumed names distinguished by a dot, a dash or an underscore.
AND (email LIKE '%.%@%' OR email LIKE '%-%@%' OR email LIKE '%_%@%')
This is very important to avoid one-word prefix emails, which are often generic emails. Indeed we will prefer to assign them default values (Friend, [interest]
enthusiast ...).
5) Exclude classic generic emails
The regex below can be a bit long, but efficient. Some terms are useless depending on your data, but I let this complete regex for security.
AND email NOT REGEXP 'contact@|kontakt@|info@|infos@|hello@|bonjour@|help@|helpdesk@|aide@|user@| firstname@|lastname@|firstlast@|firstinitiallastname@|register@|registry@|informacion@|general@|service@|client@|clients@|customer@|marketing@|recrutement@|rekrutierung@|recruitment@| webmaster@|webteam@|postmaster@|news@|poststelle@|secretaria@|sekretaria@|direction@|direccion@|president@|praesident@|prezident@|reception@|recepcion@|admin@| yourmail@|youremail@|emailaddress@|noemail@|email@|yourdomain@|scanner@|processing@|press@|government@|gouvernement@|tourisme@|turism@|online@| opportunity@|opportunities@|oportunidad@|occasion@|promotion@|promocion@|question@|bureau@|office@|support@|accueil@|dataprotection@|protection@|donation@|redak@|redac@|portal@|portal@|portail@|emcportail@|informace@|project@|project@|apartmen@|intendente@|hotel.com@| hotels@|infohub@|courses@|website@|doctolib@|doctissimo@|allodocteur@|exchange@|enquiries@|enquiries@|inquiry@|erasmus@|global@| liberation.fr@|password@|adobe@|requests@|request@|someone@|someuser@|access@|governance@|innovation@|regulation@| .med@|.dr@|.ms@|.pro@|.cosmetic@|.executive@|plastica@|plastic@|plastique@|.dermato@|.derma@| recognition@|volunteering@|socialmedia@|translator@|market@|directiva@|revistaude@|translator@|campaign@|economia@| biblio@|repositorio@|firefox@|uBlock@|blocker@|archive@|comunicacion@|educacion@|vacancies@|vacance@|holiday@|webview@|congress@|Investor@| registration@|policy@|accounting@|company@|advert@|entreprise@|abuse@|subscribconsultdre@|reservation@|reservierung@|reply@|erreur@|error@|disable@| enable@|enable@|enablejs@|delete@|delete@|feedback@|account@|stylesheet@|exemple@|example@|test@|meeting@|membership@|member@|volunteer@|swisscor@| notification@|notificacion@|managment@|managing@|licence@|license@|licensing@|legal@|communication@|communicatie@|social@|media@|notice@|privacy@|copyright@|trademark@|datenschutz@|database@|databaze@|dataweb@| bibliotheque@|library@|biblioteca@|academi@|akademi@|redaktion@|redaction@|edition@|editor@|editeur@|publisher@|travelagency@|travel@|travel@|booking@|global@|community@|advocacy@|airbnb@|20minute@|youtube@|vimeo@|paypal@|motion@|play@|google@| mailchimp@|facebook@|snapchat@|twitter@|linkedin@|instagram@|johndoe@|wordpress@|joomla@|drupal@|politico@|student@|etudiant@'
Do not hesitate to customize this regex code part, even by shortening it. Indeed in some cases a too rigorous regex will be unproductive.
6) Avoid special generic email
Finish the update query excluding some particular generic emails, according to your working fields and usal data.
Example (to customize):
AND email NOT LIKE 'info.%@%' AND email NOT LIKE 'infos.%@%' AND email NOT LIKE 'sale.%@%' AND email NOT LIKE 'sales.%@%' AND email NOT LIKE '%commercial%@%' AND email NOT LIKE '%comercial%@%' AND email NOT LIKE '%swisscorr%@%' AND email NOT LIKE '%dermato%@%' AND email NOT LIKE '%clinic%@%' AND email NOT LIKE '%clinique%@%' AND email NOT LIKE '%medical%@%' AND email NOT LIKE '%antiaging%@%' AND email NOT LIKE '%anti-aging%@%' AND email NOT LIKE '%antiageing%@%' AND email NOT LIKE '%anti-ageing%@%' AND email NOT LIKE '%aptos%@%' AND email NOT LIKE '%Dental%@%' AND email NOT LIKE '%Rmd%@%' AND email NOT LIKE '%aptos%@%' AND email NOT LIKE '%France%@%' AND email NOT LIKE '%Italy%@%' AND email NOT LIKE '%Spain%@%' AND email NOT LIKE '%Germany%@%' AND email NOT LIKE '.usa%@%' AND email NOT LIKE '%usa.@%' AND email NOT LIKE '%Europe%@%' AND email NOT LIKE '.uk%@%' AND email NOT LIKE '%uk.@%' AND email NOT LIKE '%contact.%@%' AND email NOT LIKE '%kontakt.%@%' AND email NOT LIKE '%consultdr.%@%' AND email NOT LIKE '%consult.%@%' AND email NOT LIKE '%consultoriodr.%@%' AND email NOT LIKE '%consultorio.%@%' ;
7) Get name case
Good! At this point you get your new_name
in this shape, when filled:
prof edward osborne wilson
The queries below goes to update first letter of each word in uppercase, each others in lowercase, managing the dash for compound names.
UPDATE users SET new_name = LOWER(new_name); UPDATE users SET new_name = CONCAT(" ", new_name); UPDATE users SET new_name = REPLACE(new_name, " a", " A"); UPDATE users SET new_name = REPLACE(new_name, " z", " Z"); UPDATE users SET new_name = REPLACE(new_name, " e", " E"); UPDATE users SET new_name = REPLACE(new_name, " r", " R"); UPDATE users SET new_name = REPLACE(new_name, " t", " T"); UPDATE users SET new_name = REPLACE(new_name, " y", " Y"); UPDATE users SET new_name = REPLACE(new_name, " u", " U"); UPDATE users SET new_name = REPLACE(new_name, " i", " I"); UPDATE users SET new_name = REPLACE(new_name, " o", " O"); UPDATE users SET new_name = REPLACE(new_name, " p", " P"); UPDATE users SET new_name = REPLACE(new_name, " q", " Q"); UPDATE users SET new_name = REPLACE(new_name, " s", " S"); UPDATE users SET new_name = REPLACE(new_name, " d", " D"); UPDATE users SET new_name = REPLACE(new_name, " f", " F"); UPDATE users SET new_name = REPLACE(new_name, " g", " G"); UPDATE users SET new_name = REPLACE(new_name, " h", " H"); UPDATE users SET new_name = REPLACE(new_name, " j", " J"); UPDATE users SET new_name = REPLACE(new_name, " k", " K"); UPDATE users SET new_name = REPLACE(new_name, " l", " L"); UPDATE users SET new_name = REPLACE(new_name, " m", " M"); UPDATE users SET new_name = REPLACE(new_name, " w", " W"); UPDATE users SET new_name = REPLACE(new_name, " x", " X"); UPDATE users SET new_name = REPLACE(new_name, " c", " C"); UPDATE users SET new_name = REPLACE(new_name, " v", " V"); UPDATE users SET new_name = REPLACE(new_name, " b", " B"); UPDATE users SET new_name = REPLACE(new_name, " n", " N"); UPDATE users SET new_name = REPLACE(new_name, "-a", "-A"); UPDATE users SET new_name = REPLACE(new_name, "-z", "-Z"); UPDATE users SET new_name = REPLACE(new_name, "-e", "-E"); UPDATE users SET new_name = REPLACE(new_name, "-r", "-R"); UPDATE users SET new_name = REPLACE(new_name, "-t", "-T"); UPDATE users SET new_name = REPLACE(new_name, "-y", "-Y"); UPDATE users SET new_name = REPLACE(new_name, "-u", "-U"); UPDATE users SET new_name = REPLACE(new_name, "-i", "-I"); UPDATE users SET new_name = REPLACE(new_name, "-o", "-O"); UPDATE users SET new_name = REPLACE(new_name, "-p", "-P"); UPDATE users SET new_name = REPLACE(new_name, "-q", "-Q"); UPDATE users SET new_name = REPLACE(new_name, "-s", "-S"); UPDATE users SET new_name = REPLACE(new_name, "-d", "-D"); UPDATE users SET new_name = REPLACE(new_name, "-f", "-F"); UPDATE users SET new_name = REPLACE(new_name, "-g", "-G"); UPDATE users SET new_name = REPLACE(new_name, "-h", "-H"); UPDATE users SET new_name = REPLACE(new_name, "-j", "-J"); UPDATE users SET new_name = REPLACE(new_name, "-k", "-K"); UPDATE users SET new_name = REPLACE(new_name, "-l", "-L"); UPDATE users SET new_name = REPLACE(new_name, "-m", "-M"); UPDATE users SET new_name = REPLACE(new_name, "-w", "-W"); UPDATE users SET new_name = REPLACE(new_name, "-x", "-X"); UPDATE users SET new_name = REPLACE(new_name, "-c", "-C"); UPDATE users SET new_name = REPLACE(new_name, "-v", "-V"); UPDATE users SET new_name = REPLACE(new_name, "-b", "-B"); UPDATE users SET new_name = REPLACE(new_name, "-n", "-N"); UPDATE users SET new_name = REPLACE(new_name, "- ", "-"); UPDATE users SET new_name = REPLACE(new_name, " ", " "); UPDATE users SET new_name = SUBSTR(new_name, 2 );
8) Check results
Done. Now you can check and explore your results like this:
SELECT name, new_name, email FROM users WHERE new_name NOT LIKE '' ;
You worked in a special new field at this point. So you can come back, deleting new_name
field, and update previous scripts at any time.
9) Transfert results
Finally, when you are satisfied:
UPDATE users SET name = new_name WHERE new_name NOT LIKE '' ; ALTER TABLE users DROP COLUMN new_name ;
Don't forget the where
condition! This to not touch right filled records.
#DeliverabilityImprovement