Introduction

Build StatusTest Coverage Pypi Version

This package contains a variety of python modules for Myanmar text proccesing, such as syllabification, romanization, encoding conversion, nrc validation etc. Only python3 is currently supported at the moment.

Installation

The package is distributed on PyPI and can be installed with pip:

pip install python-myanmar

For more information, please read the full documentation here.

Installation

Stable release

To install Python Myanmar, run this command in your terminal:

$ pip install python-myanmar

This is the preferred method to install Python Myanmar, as it will always install the most recent stable release.

If you don’t have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for Python Myanmar can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/trhura/python-myanmar

Or download the tarball:

$ curl  -OL https://github.com/trhura/python-myanmar/tarball/master

Once you have a copy of the source, you can install it with:

$ python setup.py install

Syllabification

Morphological and phonetic syllable break for Burmese text. Syllable break with zawgyi text will not be accurate / reliable. You probably should convert it to unicode before processing.

myanmar.language.MorphoSyllableBreak(text, encoding)[source]

Return an iterable of morphological / visual syllables in text.

>>> from myanmar.encodings import UnicodeEncoding
>>> slb = list(MorphoSyllableBreak("အကြွေးပေး", UnicodeEncoding()))
>>> list(s['syllable'] for s in slb)
['အ', 'ကြွေး', 'ပေး']
>>> slb[2]
{'syllable': 'ပေး', 'consonant': 'ပ', 'eVowel': 'ေ', 'visarga': 'း'}
myanmar.language.PhonemicSyllableBreak(text, encoding)[source]

Return an iterable of phonemic syllables in text.

>>> from myanmar.encodings import UnicodeEncoding
>>> slb = list(PhonemicSyllableBreak("သီးပင်အိုင်", UnicodeEncoding()))
>>> list(s['syllable'] for s in slb)
['သီး', 'ပင်', 'အိုင်']
>>> slb[0]
{'syllable': 'သီး', 'consonant': 'သ', 'iVowel': 'ီ', 'visarga': 'း'}

Encodings

Convert text in various Myanmar encodings. It currently supports wininnwa, zawgyi, unicode. Perfomance-wise, it is not as good as other regex-based converters.

myanmar.converter.convert(text, fromenc, toenc)[source]

Convert text in fromenc encoding to toenc encoding.

>>> convert('အကျိုးတရား', 'unicode', 'zawgyi')
'အက်ိဳးတရား'
>>> convert('ဉာဏ္ႀကီးရွင္', 'zawgyi', 'unicode')
'ဉာဏ်ကြီးရှင်'
>>> convert('&[ef;', 'wininnwa', 'unicode')
'ရဟန်း'
myanmar.converter.get_supported_encodings()[source]

Get a list of encodings supported by converter module.

>>> get_supported_encodings()
['unicode', 'zawgyi', 'wininnwa']

Transliteration

Transliterate Burmese text with latin characters. Currently, romanization based on BGN_PCGN, MLCTS, IPA systems are available.

myanmar.romanizer.romanize(string, system)[source]

Transliterate Burmese text with latin letters.

>>> romanize("ကွန်ပျူတာ", IPA)
'kʊ̀ɴpjùtà'
>>> romanize("ပဒေသရာဇာ", MLC)
'padezarājā'
>>> romanize("ဘင်္ဂလားအော်", BGN_PCGN)
'bin-gala-aw'

Phonenumbers

Validation and normalization for Myanmar phonenumbers. Based on mm_phonenumber module from Melomap.

class myanmar.phonenumber.Operator[source]

An enumeration.

myanmar.phonenumber.get_landline_operator(phonenumber)[source]

Get operator type for a given landline number.

>>> get_landline_operator('+95674601234')
'MyanmarAPN'
>>> get_landline_operator('9524261234')
'MyanmarSpeedNet'
>>> get_landline_operator('14681234')
'VoIPMyanmarGroup'
myanmar.phonenumber.get_phone_operator(phonenumber)[source]

Get operator type for a given phonenumber.

>>> get_phone_operator('+959262624625')
<Operator.Mpt: 'MPT'>
>>> get_phone_operator('09970000234')
<Operator.Ooredoo: 'Ooredoo'>
>>> get_phone_operator('123456789')
<Operator.Unknown: 'Unknown'>
myanmar.phonenumber.is_valid_phonenumber(phonenumber)[source]

Checks whether a given phonenumber is a valid Myanmar number or not.

>>> is_valid_phonenumber('09420028187')
True
>>> is_valid_phonenumber('+959420028187')
True
>>> is_valid_phonenumber(9420028187)
False
>>> is_valid_phonenumber(94200281870)
False
myanmar.phonenumber.normalize_phonenumber(phonenumber)[source]

Normalize a given phonenumber into 959xxx number format.

>>> normalize_phonenumber('09420028187')
959420028187
>>> normalize_phonenumber('+959420028187')
959420028187
>>> normalize_phonenumber('420028187')
959420028187

Myanmar NRC

Validation and normalization for Myanmar NRC number.

myanmar.nrc.is_valid_nrc(nrc)[source]

Check whether the given string is valid Myanmar national registration ID or not

>>> is_valid_nrc('12/LMN (N) 144144')
True
>>> is_valid_nrc('5/PMN (N) 123456')
False
myanmar.nrc.normalize_nrc(nrc)[source]

Check the given string is valid myanmar nrc or not and normalize the string to simplest form if the string is valid

>>> normalize_nrc('9/pmn(n)123456')
'9 pamana n 123456'
>>> normalize_nrc('1/bkn(n)123456')
'1 bakana n 123456'

Credits

Development Lead

  • Thura Hlaing <trhura at gmail.com>

Contributors

Indices and tables