How to send good unicode email with Python


# coding: utf-8

# Python's email API is simple and easy to use!!!!!!!!!!!!!!

# Requirements:
# * UTF-8 headers
# * UTF-8 body
# * prefer quoted-printable to base64 transfer-encoding.
# * Don't escape "From" at the beginning of a line in the message - it's not
# the 1800s any more

from cStringIO import StringIO
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.header import Header
from email import Charset
from email.generator import Generator


subject = u'Hello あ'
recipient = u'Bあb '
from_address = u'Bかb '

html = u'<html><body>Hey böb!\nFrom Jack, I got enhanced pills!</body></html>'
text = u'Hey böb!\nFrom Jack, I got enhanced pills!'

# Override python's weird assumption that utf-8 text should be encoded with
# base64, and instead use quoted-printable (for both subject and body). I
# can't figure out a way to specify QP (quoted-printable) instead of base64 in
# a way that doesn't modify global state. :-(
Charset.add_charset('utf-8', Charset.QP, Charset.QP, 'utf-8')


# This example is of an email with text and html alternatives.
multipart = MIMEMultipart('alternative')

# We need to use Header objects here instead of just assigning the strings in
# order to get our headers properly encoded (with QP).
# You may want to avoid this if your headers are already ASCII, just so people
# can read the raw message without getting a headache.
multipart['Subject'] = Header(subject.encode('utf-8'), 'UTF-8').encode()
multipart['To'] = Header(recipient.encode('utf-8'), 'UTF-8').encode()
multipart['From'] = Header(from_address.encode('utf-8'), 'UTF-8').encode()

# Attach the parts with the given encodings.
htmlpart = MIMEText(html.encode('utf-8'), 'html', 'UTF-8')
multipart.attach(htmlpart)
textpart = MIMEText(text.encode('utf-8'), 'plain', 'UTF-8')
multipart.attach(textpart)

# And here we have to instantiate a Generator object to convert the multipart
# object to a string (can't use multipart.as_string, because that escapes
# "From" lines).

io = StringIO()
g = Generator(io, False) # second argument means "should I mangle From?"
g.flatten(multipart)

# Pass the result of this to your SMTP library of choice.
print io.getvalue()


edited: The last part in a multipart message is the preferred one, so I moved the HTML part to the bottom.

edited AGAIN: I found out that in order to avoid ridiculous "From" quoting, I needed to use a Generator object instead of multipart.as_string().

10 comments:

glyph said...

Does this work in python 3?

Christopher Armstrong said...

probably not

Mary said...

Thanks for the magic to avoid base64 encoding, I had wondered.

Matt said...

Thanks a lot!

John said...

For me this solved a different problem -- how to get utf8 headers encoded. So thanks.

To test it I converted to Python 3, but then QP did not work -- there seems to be a bug in quoprimime.

John said...

Not a bug in quoprimime. My code was sending it a byte-string. The following code works for me in python 3.1.2.

from io import StringIO
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.header import Header
from email import charset
from email.generator import Generator

subject = 'Hello あ'
recipient = 'Bあb '
from_address = 'Bかb '

s_html = [[ html not allowed in comment ]]
s_text = 'Hey böb!\nFrom Jack, I got enhanced pills!'

# specify quoted-printable instead of base64
charset.CHARSETS['utf-8'] = ( charset.QP, charset.QP, 'utf-8' )

multipart = MIMEMultipart('alternative')

multipart['Subject'] = Header(subject, 'UTF-8').encode()
multipart['To'] = Header(recipient, 'UTF-8').encode()
multipart['From'] = Header(from_address, 'UTF-8').encode()

htmlpart = MIMEText( s_html, 'html', 'UTF-8' )
multipart.attach(htmlpart)
textpart = MIMEText( s_text, 'plain', 'UTF-8')
multipart.attach(textpart)

sio = StringIO()
g = Generator(sio, mangle_from_ = False)
g.flatten(multipart)

print( sio.getvalue() )

honzas said...

Hi,
first I would like to thank you for very inspiring outline.
Unfortunately your code as well as John's code doesn't work with czech letters with accent, e.g. čšřěů. But it does work with letters like íáýú, it's interesting ...
If I use John's code and pass in the czech letters with accent to the payload I get the following traceback (last item in the stack):
/quoprimime.py", line 80, in body_check
return chr(octet) != _QUOPRI_BODY_MAP[octet]
KeyError: 352
Value in the KeyError depends on the passed in letter.
After several tryies I have a code which is able to accept any letter. The code is working with python 3.1:

from email.header import Header
from email.message import Message
from quopri import encodestring
from io import StringIO
from email.generator import Generator
from smtplib import SMTP
import email.charset

email.charset.add_charset("utf-8", email.charset.QP, email.charset.QP, "utf-8")

smtp = "smtp.server"
subject = "nová pošta" # eng: a new email
sender_name = "Jan Šimák"
sender_email = "sender email"
recipient = "recipient email"
text = "Czech letters with accent: ěščřžýá íéúůďňť"

msg = Message()
msg["Subject"] = Header(subject, "utf-8").encode()
f = Header(sender_name, "utf-8").encode()
msg["From"] = "{} <{}>".format(f, sender_email)
msg["To"] = recipient

msg.set_type("text/plain")
msg.set_param("charset", "utf-8")
msg["Content-Transfer-Encoding"] = "quoted-printable"
msg.set_payload(encodestring(text.encode(), True).decode())

sio = StringIO()
g = Generator(sio, mangle_from_ = False)
g.flatten(msg)
print(sio.getvalue())
s = SMTP(smtp)
s.sendmail(msg["From"], msg["To"], sio.getvalue())
s.quit()

doc75 said...

Thanks to bonzas code I was able to finally send a correct mail with non-ascii characters with Python 3.
Thanks for the helpful comment.

muhammad ibraheem said...

Investment Plans to make money online, Online Jobs can make money online from home, Just Visit
www.jobzcorner.com

Rehan Ahmed said...

Most Expensive Cars in the World, Top Concept Cars, Top Strange Vehicles in the World and Car latest hot top Wallpapers
worldlatestvehicles.blogspot.com