Many tools can retrieve names from anonymous emails. But to get something clean, avoid errors and keep a custom default value for generic emails, better to do it yourself.

First, description of the problem.

Tag name

Here we concatenate salutation, first and last names from a table called usersin a tag called name. This in order to personalize newsletters, example:

Salutation First name Last name HTML View in mailbox and browser
Mr Edward Osborne Wilson
<span>Dear [name]</span>
Dear Mr Edward Osborne Wilson

OK good. But sometimes we have not all salutation, first and last names. And may we get a strange value even with all 3 fields concatenated:

Salutation First name Last name HTML View in mailbox and browser
Mr     
<span>Dear [name]</span>
Dear Mr
     
<span>Dear [name]</span>
Dear Friend
   Wilson  
<span>Dear [name]</span>
Dear Wilson
Mr  Ed  
<span>Dear [name]</span>
Dear Mr Ed
 O W
<span>Dear [name]</span>
Dear M E O W

The saddest is sometimes, email reveals the perfect identity of the contact!

Even better than the data entered ... example:

Salutation First name Last name Email HTML View in mailbox and browser
Mr Test Xyz Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.
<span>Dear [name]</span>
Dear Mr Test Xyz

Indeed recovering his name from email, we could extract its complete identity, to finally get a beautiful:

Dear Prof Edward Osborne Wilson ...

But still it is necessary to detect the data entry errors and choose between the input data, the data entered standardized, a custom default value or an extraction from the email.

Examples to avoid with automatic identity recovering from email:

Email View in mailbox and browser after recovering identity from email
Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser. Dear Profedwardosbornewilson
Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser. Dear Info
Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser. Dear Contact Abstract
Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser. Dear Biodiversity

How to manage this avoiding usual erros?

SQL Script

In the standard SQL example below, we will fix wrong identities from a name field, in a user table. This in a temporary field new_name. We will only transfer the results after check.

We will do it step-by-step, to fully understand what happens and to be able to adapt the script to changing needs. The following pieces of code can then be executed in one go.

1) Temporary field to store and check results

ALTER TABLE users ADD new_name VARCHAR(255) NOT NULL ;

2) Update query

The next update query will be divided to comment script.

We start to fill our new_name field (line 2)  with email prefixs (before @).

  1. UPDATE users
  2. SET new_name = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
  3. SUBSTRING(email, 1, (SELECT LOCATE('@', email)))
  4. ,'@',''),'.',' '),'_',' '),'0',''),'1',''),'2',''),'3',''),'4',''),'5',''),'6',''),'7',''),'8',''),'9','')

The substring (line 3) recovers only what is to the left side of the @ from email field (using its position, itself recovered by the locate line 3).

The replaces (starting line 2) delete numbers and @, and update dots and underscores chars with a space (line 4).

3) Target only empty and insufficiently filled names

We will not update all records of course, to not touch those right filled. Here we update new_name only when original name field is not satisfying.

A where condition limits the query to wrong filled names (lines 2, 3 and 4 below). Do not hesitate to customize this code part.

  1. WHERE (
  2. name LIKE 'Mr' OR name LIKE 'Mrs' OR name LIKE 'Ms' OR name LIKE 'Mr.' OR name LIKE 'Mrs.' OR name LIKE 'Ms.'
  3. OR name LIKE 'Doctor' OR name LIKE 'Dr' OR name LIKE 'Dr.' OR name LIKE 'Dra'
  4. OR name LIKE 'Mister' OR name LIKE 'Miss' OR name LIKE 'Monsieur' OR name LIKE 'Madame' OR name LIKE 'Mademoiselle' OR name LIKE 'Contact'
  5. OR name LIKE '' OR name LIKE NULL OR name IS NULL
  6. )

The line 5 is here the most important, it catchs all empty names.

But the where condition doesn't finish here.

4) Target only compound email prefixs

After getting out of the first parenthesis, we add this second to select only compound email prefixs (before the @), with assumed names distinguished by a dot, a dash or an underscore.

AND (email LIKE '%.%@%' OR email LIKE '%-%@%' OR email LIKE '%_%@%')

This is very important to avoid one-word prefix emails, which are often generic emails. Indeed we will prefer to assign them default values (Friend[interest] enthusiast ...).

5) Exclude classic generic emails

The regex below can be a bit long, but efficient. Some terms are useless depending on your data, but I let this complete regex for security.

AND email NOT REGEXP 'contact@|kontakt@|info@|infos@|hello@|bonjour@|help@|helpdesk@|aide@|user@|
firstname@|lastname@|firstlast@|firstinitiallastname@|register@|registry@|informacion@|general@|service@|client@|clients@|customer@|marketing@|recrutement@|rekrutierung@|recruitment@|
webmaster@|webteam@|postmaster@|news@|poststelle@|secretaria@|sekretaria@|direction@|direccion@|president@|praesident@|prezident@|reception@|recepcion@|admin@|
yourmail@|youremail@|emailaddress@|noemail@|email@|yourdomain@|scanner@|processing@|press@|government@|gouvernement@|tourisme@|turism@|online@|
opportunity@|opportunities@|oportunidad@|occasion@|promotion@|promocion@|question@|bureau@|office@|support@|accueil@|dataprotection@|protection@|donation@|redak@|redac@|portal@|portal@|portail@|emcportail@|informace@|project@|project@|apartmen@|intendente@|hotel.com@|
hotels@|infohub@|courses@|website@|doctolib@|doctissimo@|allodocteur@|exchange@|enquiries@|enquiries@|inquiry@|erasmus@|global@|
liberation.fr@|password@|adobe@|requests@|request@|someone@|someuser@|access@|governance@|innovation@|regulation@|
.med@|.dr@|.ms@|.pro@|.cosmetic@|.executive@|plastica@|plastic@|plastique@|.dermato@|.derma@|
recognition@|volunteering@|socialmedia@|translator@|market@|directiva@|revistaude@|translator@|campaign@|economia@|
biblio@|repositorio@|firefox@|uBlock@|blocker@|archive@|comunicacion@|educacion@|vacancies@|vacance@|holiday@|webview@|congress@|Investor@|
registration@|policy@|accounting@|company@|advert@|entreprise@|abuse@|subscribconsultdre@|reservation@|reservierung@|reply@|erreur@|error@|disable@|
enable@|enable@|enablejs@|delete@|delete@|feedback@|account@|stylesheet@|exemple@|example@|test@|meeting@|membership@|member@|volunteer@|swisscor@|
notification@|notificacion@|managment@|managing@|licence@|license@|licensing@|legal@|communication@|communicatie@|social@|media@|notice@|privacy@|copyright@|trademark@|datenschutz@|database@|databaze@|dataweb@|
bibliotheque@|library@|biblioteca@|academi@|akademi@|redaktion@|redaction@|edition@|editor@|editeur@|publisher@|travelagency@|travel@|travel@|booking@|global@|community@|advocacy@|airbnb@|20minute@|youtube@|vimeo@|paypal@|motion@|play@|google@|
mailchimp@|facebook@|snapchat@|twitter@|linkedin@|instagram@|johndoe@|wordpress@|joomla@|drupal@|politico@|student@|etudiant@'

Do not hesitate to customize this regex code part, even by shortening it. Indeed in some cases a too rigorous regex will be unproductive.

6) Avoid special generic email

Finish the update query excluding some particular generic emails, according to your working fields and usal data.

Example (to customize):

AND email NOT LIKE 'info.%@%' AND email NOT LIKE 'infos.%@%'
AND email NOT LIKE 'sale.%@%' AND email NOT LIKE 'sales.%@%'
AND email NOT LIKE '%commercial%@%' AND email NOT LIKE '%comercial%@%'
AND email NOT LIKE '%swisscorr%@%' AND email NOT LIKE '%dermato%@%'
AND email NOT LIKE '%clinic%@%' AND email NOT LIKE '%clinique%@%'
AND email NOT LIKE '%medical%@%' AND email NOT LIKE '%antiaging%@%'
AND email NOT LIKE '%anti-aging%@%' AND email NOT LIKE '%antiageing%@%'
AND email NOT LIKE '%anti-ageing%@%'
AND email NOT LIKE '%aptos%@%' AND email NOT LIKE '%Dental%@%'
AND email NOT LIKE '%Rmd%@%' AND email NOT LIKE '%aptos%@%'
AND email NOT LIKE '%France%@%' AND email NOT LIKE '%Italy%@%'
AND email NOT LIKE '%Spain%@%' AND email NOT LIKE '%Germany%@%'
AND email NOT LIKE '.usa%@%' AND email NOT LIKE '%usa.@%'
AND email NOT LIKE '%Europe%@%' AND email NOT LIKE '.uk%@%' AND email NOT LIKE '%uk.@%'
AND email NOT LIKE '%contact.%@%' AND email NOT LIKE '%kontakt.%@%'
AND email NOT LIKE '%consultdr.%@%' AND email NOT LIKE '%consult.%@%'
AND email NOT LIKE '%consultoriodr.%@%' AND email NOT LIKE '%consultorio.%@%' ;

7) Get name case

Good! At this point you get your new_name in this shape, when filled:

prof edward osborne wilson

The queries below goes to update first letter of each word in uppercase, each others in lowercase, managing the dash for compound names.

UPDATE users SET new_name = LOWER(new_name);
UPDATE users SET new_name = CONCAT(" ", new_name);
 
UPDATE users SET new_name = REPLACE(new_name, " a", " A");
UPDATE users SET new_name = REPLACE(new_name, " z", " Z");
UPDATE users SET new_name = REPLACE(new_name, " e", " E");
UPDATE users SET new_name = REPLACE(new_name, " r", " R");
UPDATE users SET new_name = REPLACE(new_name, " t", " T");
UPDATE users SET new_name = REPLACE(new_name, " y", " Y");
UPDATE users SET new_name = REPLACE(new_name, " u", " U");
UPDATE users SET new_name = REPLACE(new_name, " i", " I");
UPDATE users SET new_name = REPLACE(new_name, " o", " O");
UPDATE users SET new_name = REPLACE(new_name, " p", " P");
UPDATE users SET new_name = REPLACE(new_name, " q", " Q");
UPDATE users SET new_name = REPLACE(new_name, " s", " S");
UPDATE users SET new_name = REPLACE(new_name, " d", " D");
UPDATE users SET new_name = REPLACE(new_name, " f", " F");
UPDATE users SET new_name = REPLACE(new_name, " g", " G");
UPDATE users SET new_name = REPLACE(new_name, " h", " H");
UPDATE users SET new_name = REPLACE(new_name, " j", " J");
UPDATE users SET new_name = REPLACE(new_name, " k", " K");
UPDATE users SET new_name = REPLACE(new_name, " l", " L");
UPDATE users SET new_name = REPLACE(new_name, " m", " M");
UPDATE users SET new_name = REPLACE(new_name, " w", " W");
UPDATE users SET new_name = REPLACE(new_name, " x", " X");
UPDATE users SET new_name = REPLACE(new_name, " c", " C");
UPDATE users SET new_name = REPLACE(new_name, " v", " V");
UPDATE users SET new_name = REPLACE(new_name, " b", " B");
UPDATE users SET new_name = REPLACE(new_name, " n", " N");
 
UPDATE users SET new_name = REPLACE(new_name, "-a", "-A");
UPDATE users SET new_name = REPLACE(new_name, "-z", "-Z");
UPDATE users SET new_name = REPLACE(new_name, "-e", "-E");
UPDATE users SET new_name = REPLACE(new_name, "-r", "-R");
UPDATE users SET new_name = REPLACE(new_name, "-t", "-T");
UPDATE users SET new_name = REPLACE(new_name, "-y", "-Y");
UPDATE users SET new_name = REPLACE(new_name, "-u", "-U");
UPDATE users SET new_name = REPLACE(new_name, "-i", "-I");
UPDATE users SET new_name = REPLACE(new_name, "-o", "-O");
UPDATE users SET new_name = REPLACE(new_name, "-p", "-P");
UPDATE users SET new_name = REPLACE(new_name, "-q", "-Q");
UPDATE users SET new_name = REPLACE(new_name, "-s", "-S");
UPDATE users SET new_name = REPLACE(new_name, "-d", "-D");
UPDATE users SET new_name = REPLACE(new_name, "-f", "-F");
UPDATE users SET new_name = REPLACE(new_name, "-g", "-G");
UPDATE users SET new_name = REPLACE(new_name, "-h", "-H");
UPDATE users SET new_name = REPLACE(new_name, "-j", "-J");
UPDATE users SET new_name = REPLACE(new_name, "-k", "-K");
UPDATE users SET new_name = REPLACE(new_name, "-l", "-L");
UPDATE users SET new_name = REPLACE(new_name, "-m", "-M");
UPDATE users SET new_name = REPLACE(new_name, "-w", "-W");
UPDATE users SET new_name = REPLACE(new_name, "-x", "-X");
UPDATE users SET new_name = REPLACE(new_name, "-c", "-C");
UPDATE users SET new_name = REPLACE(new_name, "-v", "-V");
UPDATE users SET new_name = REPLACE(new_name, "-b", "-B");
UPDATE users SET new_name = REPLACE(new_name, "-n", "-N");
 
UPDATE users SET new_name = REPLACE(new_name, "- ", "-");
UPDATE users SET new_name = REPLACE(new_name, "  ", " ");
 
UPDATE users SET new_name = SUBSTR(new_name, 2 );

8) Check results

Done. Now you can check and explore your results like this:

SELECT name, new_name, email
FROM users WHERE new_name NOT LIKE '' ;

You worked in a special new field at this point. So you can come back, deleting new_name field, and update previous scripts at any time.

9) Transfert results

Finally, when you are satisfied:

UPDATE users SET name = new_name WHERE new_name NOT LIKE '' ;
ALTER TABLE users DROP COLUMN new_name ;

Don't forget the where condition! This to not touch right filled records.

#DeliverabilityImprovement