From WhatsApp to Excel. Can't arrange it

Question

I'm trying to get data from WhatsApp (it's data from surgeries from 2020 to 2023 from my hospital so it's sensitive and it's a lot) to Excel.

Basically what I did (with help of yt of course) is to connect wsp to pushbullet and get a .txt from it and then with that .txt

With that .txt I read it, then "Clean it" (I think this clean it might be part of the problem) with this code:

#Read file
with open(file_path, mode='r', encoding="utf8") as f:
    data = f.readlines()

#print(data)

#print(data[0])

dataset = data[4:]
cleaned_data = []
for line in dataset:
    date=line.split(",")[0]
    line2= line[len(date):]
    time = line2.split("-")[0][2:]
    line3 = line2[len(time):]
    name = line3.split(":")[0][4:]
    line4 = line3[len(name):]
    message = line4[6:-1]


    print(message)
    cleaned_data.append([date,time,name,message])

df= pd.DataFrame(cleaned_data, columns= ['Date','Time','name','Message'])
df.to_excel('estadistica_automatizada.xlsx', index=False)

And then with that .xlsx im trying to rearrange it with this:

import openpyxl
import re

workbook = openpyxl.load_workbook(r'C:\Users\olivi\PycharmProjects\Estadistica\estadistica_automatizada.xlsx')

worksheet = workbook['Sheet1']  # Replace 'Sheet1' with your actual sheet name

search_patterns = {
    '*Programación': r'.*Programación.*',
    '*Fecha:': r'.*Fecha:.*',
    '*Nombre:': r'.*Nombre:.*',
    '*Patología:': r'.*Patología:.*',
    '*Cirug[ií]a:': r'.*Cirug[ií]a:.*',
    '*Cirujano:': r'.*Cirujano:.*',
    '*Primer Ayudante:': r'.*Primer Ayudante:.*',
    '**Segundo Ayudante:': r'.*Segundo Ayudante:.*'
}

destination_columns = {
    '*Programación': 'A',
    '*Fecha:': 'C',
    '*Nombre:': 'D',
    '*Patología:': 'F',
    '*Cirug[ií]a:': 'E',
    '*Cirujano:': 'G',
    '*Primer Ayudante:': 'H',
    '**Segundo Ayudante:': 'I'
}

patterns = {key: re.compile(pattern, re.IGNORECASE) for key, pattern in search_patterns.items()}

for column in destination_columns.values():
    if column not in worksheet.columns:
        worksheet.insert_cols(worksheet.max_column + 1)

for row in worksheet.iter_rows(min_row=1):
    cell_value = row[0].value  # Assuming the column to search is the first column (column A)
    if cell_value:
        for key, pattern in patterns.items():
            if pattern.match(str(cell_value)):
                destination_column = destination_columns[key]
                destination_cell = worksheet.cell(row=row[0].row, column=worksheet[destination_column + '1'].column)
                destination_cell.value = cell_value
                row[0].value = None
                break

workbook.save(r'C:\Users\olivi\PycharmProjects\Estadistica\estadistica_automatizada2.xlsx')

What I'm trying to do is: 1_Get the text from wsp (we got a group called statistics and basically its messages are arranged like: *Fecha: XXX (date of the surgery) *Programación: Guardia (Programation of the surgery if it was programmed or urgent) *Nombre: XXX (Name of the patient) *Patología: Complicacion de DPC (Pathology) *Cirugía: Lavado de herida + recambio de VAC (What surgery are we doing) *Cirujano: XXX (The name of the surgeon) *Primer Ayudante: XXX (Name of first helper) *Segundo Ayudante: XXX (Name of the second helper, this line is not always applied i.e minor surgeries)

2_Grab that .txt and transfer it to excel

3_ Rearrange the excel in columns PROGRAMATION MONTH DATE NAME SURGERY PATHOLOGY SURGEON FIRST HELPER 2ND HELPER

The thing is that:

-I'm not being able to arrange it properly because the data from the same patient falls in different rows (I think its because I'm getting the .txt rearranged in lines instead of messages) and I want everything from 1 patient to be in the same row but different columns. I tried searching for '*Programación': r'.Programación.' to the other '*Programación': r'.Programación.' but the output isn't quite right

Are there any easy solutions? I'm not a programmer, and I'm pretty lost.

I tried searching from '*Programación': r'.Programación.' and then grouping everything that follows that until the next '*Programación': r'.Programación.'

*Programacion: data1 data2 data3 data4

*Programacion: data5 data6 data7

to get something like this: Programacion: (Column A) data 1 (column B), data 2 (C), data 3.... programacion (Column A) data 5 (B), 6 (C), 7 (D)...

Please add the language tags to ensure you get help from those subject matter experts. — Scott Craner, Jun 21 '23 at 15:10
I suggest you use Power Query directly in Excel and clean the data there — Ricardo Diaz, Jun 21 '23 at 15:29

From WhatsApp to Excel. Can't arrange it

0 Answers0