dashi package

Subpackages

Submodules

dashi.constants module

dashi.utils module

Utils functions

format_data(input_dataframe, *, date_column_name=None, source_column_name=None, date_format='%y/%m/%d', verbose=False, numerical_column_names=None, categorical_column_names=None)[source]

Function to transform dates into ‘Date’ Python format

Parameters:
  • input_dataframe (pd.DataFrame) – Pandas dataframe object with at least one columns of dates.

  • date_column_name (Optional[str]) – The name of the column containing the dates. If you are not performing a temporal analysis, set this parameter to None.

  • source_column_name (Optional[str]) – The name of the column containing the source information. If you are not performing a multi-source analysis, set this parameter to None.

  • date_format (Optional[str]) – Structure of date format. By default ‘%y/%m/%d’.

  • verbose (bool) – Whether to display additional information during the process. Defaults to False.

  • numerical_column_names (Optional[List[str]]) – A list containing all the numerical column names in the dataset. If this parameter is None, the variables types must be managed by the user.

  • categorical_column_names (Optional[List[str]]) – A list containing all the categorical column names in the dataset. If this parameter is None, the variables types must be managed by the user.

Returns:

A pandas.DataFrame with each column cast to its correct dtype and any rows containing missing values in the date or source fields dropped.

Return type:

pd.DataFrame

Module contents