Echarts in Python

Charting in python is possible with Charts. The Apache ECharts project is a web-based charting library. It was started in 2013 and built using 77.5K lines of TypeScript. It is well documented and has over 200 examples of its API’s usage.

Let us look at an example:

import json

import pyecharts.options as opts
from pyecharts.charts import Sankey


data = json.loads(open('energy.json', 'r').read())

Sankey(init_opts=opts.InitOpts(width="800px", height="600px")).add(
    series_name="",
    nodes=data["nodes"],
    links=data["links"],
    itemstyle_opts=opts.ItemStyleOpts(border_width=1,
                                      border_color="#aaa"),
    linestyle_opt=opts.LineStyleOpts(color="source",
                                     curve=0.5,
                                     opacity=0.5),
    tooltip_opts=opts.TooltipOpts(trigger_on="mousemove"),
)\
.set_global_opts(title_opts=opts.TitleOpts(title="Sankey Diagram"))\
.render("sankey_diagram.html")

please check out : https://echarts.apache.org/en/index.html

Warnings in python

Having problems to see warnings in python. You can use the flag -w.

$ python -Wd manage.py runserver
/path/to/.pyenv/versions/phcommt310/lib/python3.10/site-packages/django/conf/__init__.py:240: RemovedInDjango50Warning: The USE_L10N setting is deprecated. Starting with Django 5.0, localized formatting of data will always be enabled. For example Django will display numbers and dates using the format of the current locale.
  warnings.warn(USE_L10N_DEPRECATED_MSG, RemovedInDjango50Warning)
System check identified no issues (0 silenced).

In python 3.7, you have pythondevmode – https://docs.python.org/3/using/cmdline.html#envvar-PYTHONDEVMODE

Progress bar in Python

You had a need for tool for tracking the progress of long-running loops and code execution, giving you insights into how far along your code is in its execution. TQDM is the answer in python.

TQDM is a console-based progress bar library that allows you to track the progress of your code execution. It has a simple and easy-to-use interface that can be integrated into your Python code with minimal effort. TQDM provides progress bars and counters to loops that can be customized to meet your needs.


from tqdm import tqdm
from time import sleep
 
for i in tqdm(range(1000)):
    print("integer",i)

TQDM shows the progress in a terminal or console. please check out:https://github.com/tqdm/tqdm

Bevy : Dependency Injection

Dependency Injection in Python : dream come true. Yes you have a framework Bevy which gives us the same features available in any DI framework.

from dataclasses import dataclass
from bevy import dependency
from databases import Database
from cars import Car


@dataclass
class User:
    id: int
    name: str

    database: Database = dependency()

    @property
    def cars(self) -> list[Car]:
        return self.database.get_users_cars(self.id)

    @classmethod
    def get_user(cls, id: int):
        return cls.database.get_user(id)

As you all know, DI is a design pattern where the objects that your code depends on are instantiated by the caller. Those dependencies are then injected into your code when it is run. This promotes loosely coupled code where your code doesn’t require direct knowledge of what objects it depends on or how to create them. Instead, your code declares what interface it expects and an outside framework handles the work of creating objects with the correct interface.

Please check out: https://github.com/ZechCodes/Bevy

PEP 668 : externally managed environments

Many programmers struggle with pip install issues and virtual env mismatches. PEP 668 is going to help us out of these issues with externally managed environments.

From PEP 668:

A long-standing practical problem for Python users has been conflicts between OS package managers and Python-specific package management tools like pip. These conflicts include both Python-level API incompatibilities and conflicts over file ownership.

Historically, Python-specific package management tools have defaulted to installing packages into an implicit global context. With the standardization and popularity of virtual environments, a better solution for most (but not all) use cases is to use Python-specific package management tools only within a virtual environment.

This PEP proposes a mechanism for a Python installation to communicate to tools like pip that its global package installation context is managed by some means external to Python, such as an OS package manager. It specifies that Python-specific package management tools should neither install nor remove packages into the interpreter’s global context, by default, and should instead guide the end user towards using a virtual environment.

It also standardizes an interpretation of the sysconfig schemes so that, if a Python-specific package manager is about to install a package in an interpreter-wide context, it can do so in a manner that will avoid conflicting with the external package manager and reduces the risk of breaking software shipped by the external package manager.

Distros is the way to go and solve these issues. Please check out : https://peps.python.org/pep-0668/

PEP 20 : Zen of Python

PEP 20 is a python enhancement proposal. It was proposed to have 20 but has only the old 19 basic principles.

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren’t special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one– and preferably only one –obvious way to do it.
Although that way may not be obvious at first unless you’re Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it’s a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea — let’s do more of those!

This was posted in the python group: https://groups.google.com/g/comp.lang.python/c/B_VxeTBClM0/m/L8W9KlsiriUJ

These were written by Tim Peters. 20th principle was proposed by Guido van Rossum, the creator of Python but not accepted : “for Guido to fill in

Pandas Group By Performance

Are you having problems with pandas group by performance ? There are ways to improve. One of them is to use NumPy.

First let us look at pandas group by :

def pandas_groupby(df: pd.DataFrame) -> pd.DataFrame:
    return (
        df.groupby(["category", "year"])
        .apply(lambda df: np.interp(0.3, df["x"], df["y"]))
        .rename("y")
        .reset_index()
    )

Let us now look how NumPy helps to improve the query performance:

def _interpolate_wrapper(fp: np.ndarray, xp: np.ndarray, x: float) -> float:
    return float(np.interp(x=x, xp=xp, fp=fp))

def numpy_groupby(df: pd.DataFrame) -> pd.DataFrame:
      ....
      ....
      y_values = y_values.reshape([-1, num_x_unique_values])
      interpolated_y_values = np.apply_along_axis(
          _interpolate_wrapper,
          axis=1,
          arr=y_values,
          x=_INTERPOLATE_AT,
          xp=x_unique_values,
      )

Improving Django Query Performance

You have issues in your django code and query performance is bad. How do you improve it ?

Let us look at an example of django queries:

class Author(models.Model):
    name = models.CharField(max_length=100)

class Book(models.Model):
    title = models.CharField(max_length=100)
    author = models.ForeignKey(Author, on_delete=models.CASCADE)
books = Book.objects.select_related("author")

Let us look at the same query using Prefetch – selecting books written by the author

books = Book.objects.prefetch_related('author')

Django : Case statement

Many of you might be using case statement in SQL. How do we do this in Django ?

Let us first look at an example of SQL Case statement

SELECT
    blog_product.id,
    blog_product.name,
    blog_product.price,
    blog_product.category
FROM blog_product
WHERE blog_product.id IN (4, 2, 1, 3, 5)
ORDER BY
    CASE
        WHEN blog_product.id = 4 THEN 1
        WHEN blog_product.id = 2 THEN 2
        WHEN blog_product.id = 1 THEN 3
        WHEN blog_product.id = 3 THEN 4
        WHEN blog_product.id = 5 THEN 5
        ELSE NULL
    END ASC;

To handle this in django, below is the solution:

from django.db.models import Case, When
from .models import Product, Order

# Notice how we want to sort the products by the ids of the orders
order_ids = [4, 2, 1, 3, 5]
products = Product.objects.all()

preferred = Case(
    *(
        When(order__id=id, then=pos)
        for pos, id in enumerate(order_ids, start=1)
    )
)
products_sorted = products.filter(order__id__in=order_ids).order_by(preferred)

web3 server – WSGI – Python

Below are the features of WSGI standard based web3.0 server:

Iterative server
Server socket creation sequence (socket, bind, listen, accept)
Client connection creation sequence (socket, connect)
Socket pair
Socket
Ephemeral port and well-known port
Process
Process ID (PID), parent process ID (PPID), and the parent-child relationship.
File descriptors
The meaning of the BACKLOG argument of the listen socket method

All protocol-specific environment names are prefixed with web3. rather than wsgi., eg. web3.input rather thanwsgi.input.
All values present as environment dictionary values are explicitly bytes instances instead of native strings. (Environment keys however are native strings, always str regardless of platform).
All values returned by an application must be bytes instances, including status code, header names and values, and the body.
Wherever WSGI 1.0 referred to an app_iter, this specification refers to a body.
No start_response() callback (and therefore no write() callable nor exc_info data).
The readline() function of web3.input must support a size hint parameter.
The read() function of web3.input must be length delimited. A call without a size argument must not read more than the content length header specifies. In case a content length header is absent the stream must not return anything on read. It must never request more data than specified from the client.
No requirement for middleware to yield an empty string if it needs more information from an application to produce output (e.g. no “Middleware Handling of Block Boundaries”).
Filelike objects passed to a “file_wrapper” must have an __iter__ which returns bytes (never text).
wsgi.file_wrapper is not supported.
QUERY_STRING, SCRIPT_NAME, PATH_INFO values required to be placed in environ by server (each as the empty bytes instance if no associated value is received in the HTTP request).
web3.path_info and web3.script_name should be put into the Web3 environment, if possible, by the origin Web3 server. When available, each is the original, plain 7-bit ASCII, URL-encoded variant of its CGI equivalent derived directly from the request URI (with %2F segment markers and other meta-characters intact). If the server cannot provide one (or both) of these values, it must omit the value(s) it cannot provide from the environment.
This requirement was removed: “middleware components must not block iteration waiting for multiple values from an application iterable. If the middleware needs to accumulate more data from the application before it can produce any output, it must yield an empty string.”
SERVER_PORT must be a bytes instance (not an integer).
The server must not inject an additional Content-Length header by guessing the length from the response iterable. This must be set by the application itself in all situations.
If the origin server advertises that it has the web3.async capability, a Web3 application callable used by the server is permitted to return a callable that accepts no arguments. When it does so, this callable is to be called periodically by the origin server until it returns a non-None response, which must be a normal Web3 response tuple.