2
  • I can print 2 columns of a pandas data frame like this
  • How do I format a row-by-row print?
  • Here is my "ugly" solution followed by what I had expected to work
import pandas

def date_normalization(data: pandas.core.frame.DataFrame) -> None:
    # EDIT: add completed code
    # convert to desired date format
    data[normalized] = pandas.to_datetime(
        data[original],
        errors="coerce",
    ).dt.strftime('%d/%m/%Y')

original = "start"
normalized = "normalized"

data = pandas.DataFrame({
    original:
    {
        0: "AUG 26 2016",
        1: "JAN-FEB 2021",
        2: "2017-06-01 00:00:00"
    }})

date_normalization(data)

# remove rows with invalid date
data = data[data[normalized].notnull()]

# arrggghh ... this is working, but ugly  ...
for i, before in enumerate(data[original]):
    for j, after in enumerate(data[normalized]):
        if i == j:
            print(f"row {i}: {before} -> {after}")

print("\n")
# surprisingly (?) this doesn't work 
for row in data:
    print(f"{row[original]} -> {row[normalized]}")

Here is the error I get for the second try:

row 0: AUG 26 2016 -> 26/08/2016
row 1: 2017-06-01 00:00:00 -> 01/06/2017


Traceback (most recent call last):
  File "/home/oren/Downloads/GGG/main.py", line 36, in <module>
    print(f"{row[original]} -> {row[normalized]}")
TypeError: string indices must be integers
OrenIshShalom
  • 5,974
  • 9
  • 37
  • 87

2 Answers2

1

Because is created new column normalized you can use zip:

import pandas as pd

def date_normalization(data: pd.core.frame.DataFrame) -> None:
    # EDIT: add completed code
    # convert to desired date format
    data[normalized] = pd.to_datetime(
        data[original],
        errors="coerce",
    ).dt.strftime('%d/%m/%Y')
    return data.dropna(subset=['normalized'])

original = "start"
normalized = "normalized"
    
data = pd.DataFrame({
    original:
    {
        0: "AUG 26 2016",
        1: "JAN-FEB 2021",
        2: "2017-06-01 00:00:00"
    }})
    
data = date_normalization(data)
print (data)
                 start  normalized
0          AUG 26 2016  26/08/2016
2  2017-06-01 00:00:00  01/06/2017

for o,n in zip(data[original], data[normalized]):
    print(f"{o} -> {n}")
    AUG 26 2016 -> 26/08/2016
    2017-06-01 00:00:00 -> 01/06/2017
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

After you drop NaN, you can use data.reset_index(drop=True, inplace=True) to reset indices. If you do not reset indices, the original indices will be preserved even if you drop some rows.

You can use DataFrame.iterrows.

for index, row in data.iterrows():
    print(f"{row[original]} -> {row[normalized]}")
tekiz
  • 53
  • 7
  • it works great ! how come I get the index `2` for the second row? does it remember its original index before the `NaN` cleanup? – OrenIshShalom Sep 09 '22 at 06:32
  • 1
    Even if you drop `NaN` , rows preserves its index. You should use `df_all.reset_index(drop=True, inplace=True)` to reset indices. – tekiz Sep 09 '22 at 06:42
  • it works ! maybe add `data.reset_index(drop=True, inplace=True)` to your answer? – OrenIshShalom Sep 09 '22 at 06:46