0

I'm trying to read an Excel spreadsheet and create a graph based on the data from certain columns. I'm using Python 3.6.2 since some libraries used in my script, do not work well in the newer versions.

This is my code:

import os
import plotly
import plotly.graph_objects as go
import pandas as pd
import plotly.express as px

def create_graph(directoryPath):
    for root, subFolders, files in os.walk(directoryPath):
        for f in files:
            if "screenlog" in f.lower():
                print("Creating graphs from the Screen logs")
                absolutePathOfFile = os.path.join(root, f)
                pd.options.mode.chained_assignment = None  # default="warn"
                #PROCESSING THE EXCEL FILE
                df_init = pd.read_excel(absolutePathOfFile,engine="openpyxl")
                df_init["Screen&Action"]= df_init["SCREEN"].str.cat(df_init["Action_Name"], sep=" - ")
                ##### PLOTING Scatter chart######
                print("Plotting the scatter graphs")
                #using plotly to create a scatter graph
                fig_scatter= px.scatter(df_init, x="INSTANT", y="DURATION",color="Screen&Action",height=500,title="Traditional Screen Analysis")
                fig_scatter.update_layout(showlegend=False,title={       
                        "y":0.9,
                        "x":0.5,
                        "xanchor": "center",
                        "yanchor": "top"})
        fig_scatter.show()

The error message:

> Creating graphs from the Screen logs
> ...\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\fromnumeric.py:87:
> VisibleDeprecationWarning:
> 
> Creating an ndarray from ragged nested sequences (which is a
> list-or-tuple of lists-or-tuples-or ndarrays with different lengths or
> shapes) is deprecated. If you meant to do this, you must specify
> 'dtype=object' when creating the ndarray
> 
> Plotting the scatter graphs Traceback (most recent call last):   File
> "...\Documents\log_app\log_parser.py", line 3812, in <module>
>     scripts.screen.create_graph(directoryPath)   File "...\Documents\log_app\scripts\screen.py", line 20, in create_graph
>     fig_scatter= px.scatter(df_init, x="INSTANT", y="DURATION",color="Screen&Action",height=500,title="Traditional
> Screen Analysis")   File
> "...\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_chart_types.py",
> line 66, in scatter
>     return make_figure(args=locals(), constructor=go.Scatter)   File "...\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_core.py",
> line 1988, in make_figure
>     group = grouped.get_group(group_name if len(group_name) > 1 else group_name[0])   File
> "...\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\groupby\groupby.py",
> line 680, in get_group
>     raise KeyError(name) KeyError: (nan, '', '', '', '')

I spent some time searching for a workaround, and I found several sites that mentioned adding "dtype=object" as an extra parameter should fix the problem.

However, how can I add "dtype=object" in this line?:

fig_scatter= px.scatter(df_init, x="INSTANT", y="DURATION",color="Screen&Action",height=500,title="Traditional Screen Analysis")

I always get a syntax error.

Dataframe:

print(df_init)

       Tenant_Id  ...                                    Screen&Action
0              1  ...      Screen_Logs - TelemetryClickEvent.SendEvent
1              1  ...                      Screen_Logs - FilterOrReset
2              1  ...  Cyclic_Job_Logs - TelemetryClickEvent.SendEvent
3              1  ...                        Screen_Logs - Preparation
4             20  ...                 EventoDetalhe_New - Load_Atletas
...          ...  ...                                              ...
31893         20  ...                              Login - Preparation
31894          1  ...                              Login - Preparation
31895         20  ...                              Login - Preparation
31896          1  ...                              Login - Preparation
31897         20  ...                              Login - Preparation

[31898 rows x 20 columns]

print(df_init.head(10).to_dict("records"))

[{'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:44:15.345000'), 
'DURATION': 14, 'SCREEN': 'Screen_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 
'User_Id': 1263, 'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 
'Executed_By': 'E3Q3J-PR4U18', 'Session_Bytes': 8207, 'Viewstate_Bytes': 4204, 
'Session_Requests': 1, 'Access_Mode': 'Ajax', 'Request_Key': 'ac581f0c-ae8a-4f1d-8eea-39c0f89171b7', 
'Action_Name': 'TelemetryClickEvent.SendEvent', 'Espace_Name': 'ServiceCenter', 
'Application_Name': 'Service Center', 'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 
'Screen&Action': 'Screen_Logs - TelemetryClickEvent.SendEvent'}, 
{'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:44:15.202000'), 
'DURATION': 31, 'SCREEN': 'Screen_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 
'User_Id': 1263, 'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U18', 
'Session_Bytes': 8207, 'Viewstate_Bytes': 4184, 'Session_Requests': 1, 'Access_Mode': 'Ajax', 
'Request_Key': 'd4f00718-3fbe-416d-8540-aeed02f371b3', 'Action_Name': 'FilterOrReset', 
'Espace_Name': 'ServiceCenter', 'Application_Name': 'Service Center', 
'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 'Screen&Action': 
'Screen_Logs - FilterOrReset'}, {'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:44:11.733000'), 
'DURATION': 15, 'SCREEN': 'Cyclic_Job_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 
'User_Id': 1263, 'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 
'Executed_By': 'E3Q3J-PR4U18', 'Session_Bytes': 8093, 'Viewstate_Bytes': 4160, 
'Session_Requests': 1, 'Access_Mode': 'Ajax', 'Request_Key': 'a35aa5c7-81e9-487a-8df3-474b12df2a1a', 
'Action_Name': 'TelemetryClickEvent.SendEvent', 'Espace_Name': 'ServiceCenter', 
'Application_Name': 'Service Center', 'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 
'Screen&Action': 'Cyclic_Job_Logs - TelemetryClickEvent.SendEvent'}, 
{'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:44:11.733000'), 'DURATION': 797, 
'SCREEN': 'Screen_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 'User_Id': 1263, 
'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U18', 
'Session_Bytes': 8093, 'Viewstate_Bytes': 0, 'Session_Requests': 1, 
'Access_Mode': 'Screen', 'Request_Key': '4ac69574-86e2-4b85-86eb-e8cda703193c', 
'Action_Name': 'Preparation', 'Espace_Name': 'ServiceCenter', 'Application_Name': 'Service Center', 
'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 'Screen&Action': 'Screen_Logs - Preparation'}, 
{'Tenant_Id': 20, 'INSTANT': Timestamp('2021-11-25 21:44:05.108000'), 'DURATION': 1359, 
'SCREEN': 'EventoDetalhe_New', 'Session_Id': 'jXC8xqSk1Eu0q4C+VQNZww==', 'User_Id': 151, 
'Espace_Id': 80, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U18', 
'Session_Bytes': 17444, 'Viewstate_Bytes': 45784, 'Session_Requests': 4, 'Access_Mode': 'Ajax', 
'Request_Key': '9f62097f-fe6a-468a-a4d4-5b2c39e295ce', 'Action_Name': 'Load_Atletas', 
'Espace_Name': 'TalentWeb', 'Application_Name': 'TalentWeb', 
'Application_Key': '448b995f-e07e-4a21-a442-12204cfffea6', 
'Screen&Action': 'EventoDetalhe_New - Load_Atletas'}, 
{'Tenant_Id': 20, 'INSTANT': Timestamp('2021-11-25 21:44:04.573000'), 'DURATION': 33, 
'SCREEN': 'EventoDetalhe_New', 'Session_Id': 'jXC8xqSk1Eu0q4C+VQNZww==', 'User_Id': 151, 
'Espace_Id': 80, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U1S', 
'Session_Bytes': 12431, 'Viewstate_Bytes': 45824, 'Session_Requests': 1, 'Access_Mode': 'Ajax', 
'Request_Key': '61d1d2a1-2c56-4e1d-a752-8d1719aa77a8', 'Action_Name': 'Tabs.OnChange', 
'Espace_Name': 'TalentWeb', 'Application_Name': 'TalentWeb', 
'Application_Key': '448b995f-e07e-4a21-a442-12204cfffea6', 'Screen&Action': 'EventoDetalhe_New - Tabs.OnChange'},
 {'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:44:02.658000'), 'DURATION': 31, 
'SCREEN': 'Login', 'Session_Id': 'am9Hoz4suEifYkaRsh8usQ==', 'User_Id': 0, 'Espace_Id': 1, 
'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U2H', 
'Session_Bytes': 0, 'Viewstate_Bytes': 0, 'Session_Requests': 1, 'Access_Mode': 'Screen', 
'Request_Key': '1b4f4776-f622-46da-ac0f-88a897c5b8a7', 'Action_Name': 'Preparation', 
'Espace_Name': 'ServiceCenter', 'Application_Name': 'Service Center', 
'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 'Screen&Action': 'Login - Preparation'}, 
{'Tenant_Id': 20, 'INSTANT': Timestamp('2021-11-25 21:43:59.729000'), 'DURATION': 314, 
'SCREEN': 'EventoDetalhe_New', 'Session_Id': 'jXC8xqSk1Eu0q4C+VQNZww==', 'User_Id': 151, 
'Espace_Id': 80, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U1S', 
'Session_Bytes': 14891, 'Viewstate_Bytes': 0, 'Session_Requests': 3, 
'Access_Mode': 'Screen', 'Request_Key': '866f27bb-f86a-4add-b084-f433e1cadc73', 
'Action_Name': 'Preparation', 'Espace_Name': 'TalentWeb', 'Application_Name': 'TalentWeb', 
'Application_Key': '448b995f-e07e-4a21-a442-12204cfffea6', 'Screen&Action': 'EventoDetalhe_New - Preparation'}, 
{'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:43:53.594000'), 'DURATION': 1904, 
'SCREEN': 'Cyclic_Job_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 'User_Id': 1263, 
'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U18', 
'Session_Bytes': 8093, 'Viewstate_Bytes': 4160, 'Session_Requests': 1, 
'Access_Mode': 'Screen', 'Request_Key': '26f2d266-b2fc-4a2f-8551-046ba9604477', 
'Action_Name': 'ExportToExcel', 'Espace_Name': 'ServiceCenter', 'Application_Name': 'Service Center', 
'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 
'Screen&Action': 'Cyclic_Job_Logs - ExportToExcel'}, 
{'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:43:53.594000'), 'DURATION': 2060, 
'SCREEN': 'Cyclic_Job_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 'User_Id': 1263, 
'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U18', 
'Session_Bytes': 8093, 'Viewstate_Bytes': 4160, 'Session_Requests': 1, 
'Access_Mode': 'Ajax', 'Request_Key': '66d521dd-905d-4f66-ac99-a982bcccb99b', 
'Action_Name': 'TelemetryClickEvent.SendEvent', 'Espace_Name': 'ServiceCenter', 
'Application_Name': 'Service Center', 'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 
'Screen&Action': 'Cyclic_Job_Logs - TelemetryClickEvent.SendEvent'}]

print(df_init[df_init['Screen&Action'].isnull()])

Tenant_Id                 INSTANT  DURATION  ... Application_Name                       Application_Key  Screen&Action
25708         20 2021-11-24 13:45:47.022        93  ...   Statvue Emails  f256c73c-722e-4adb-b2c1-c80cc20c2745            NaN
25763         20 2021-11-24 13:39:20.798         6  ...   Statvue Emails  f256c73c-722e-4adb-b2c1-c80cc20c2745            NaN
25782         20 2021-11-24 13:38:06.664       171  ...   Statvue Emails  f256c73c-722e-4adb-b2c1-c80cc20c2745            NaN
25805         20 2021-11-24 13:36:15.512      2295  ...   Statvue Emails  f256c73c-722e-4adb-b2c1-c80cc20c2745            NaN

The Excel file I'm trying to read.

Any help is appreciated.

Luis
  • 7
  • 1
  • 3
  • When those links (and the warning itself) talk about adding the `dtype=object`, they mean to do so in the expression that tries to create the array, e.g. `np.array(some_list, dtype=object)`. Unfortunately in your case that action is buried deep in the `scatter` call. I don't know what `plotly` is doing; the traceback indicates that it is doing a grouping on the dataframe. – hpaulj Jan 26 '22 at 17:12
  • The ragged warning is just a warning. The keyerror, that causes the code to quit, is something else, possibly unrelated. – hpaulj Jan 26 '22 at 17:14
  • agreed - ragged edges is just a warning. most probable issue is with **"Screen&Action"** column content. is it a categorical (strings)? provide a sample of your dataframe in the question (`df_init.head(10)`). Also I understand your point wrt python 3.6 but this is saving up pain, many libraries already don't support python 3.6 – Rob Raymond Jan 26 '22 at 18:45
  • @hpaulj Hello. Thank you for the clarification. I'll invest my time trying to see why the KeyError: (nan, '', '', '', '') is triggered. – Luis Jan 26 '22 at 21:53
  • @RobRaymond Thank you for your time. Yes, I also understand there is a possibility I will have to upgrade my Python version; however, out of 4K lines of code, this is the only line causing this issue. I'm able to create graphs with pandas, but not in the way I'm showing in this example. By the way, I updated this case with the requested information. Thank you. – Luis Jan 26 '22 at 21:56
  • your sample data is clipped so doesn't have data for columns you use in call to plotly. try `df_init.head(10).to_dict("records")`. my previous comment re failure, just think about how Plotly Express works.... the **color** argument if categorical will be used to do a pandas `groupby()` on the dataframe. That's what really relates to your error. I've managed and owned very large systems, letting them become end-of-life lands up taking more effort to keep them going than keeping versions within acceptable minimum versions... always tempting though ;-) never the priority of budget owners – Rob Raymond Jan 26 '22 at 22:13
  • @RobRaymond Thank you for your reply. I updated this ticket with the correct information. I formatted the output in a way that I hope is easier to read. I agree with your statement, the time invested to try to keep things flowing would be almost similar to rewriting the logic with supported libraries. – Luis Jan 27 '22 at 00:04
  • The graph was created correctly for the 10 data points that were hit. When I tried it with the provided Excel data, an error occurred. I think this is caused by the fact that the 'Action_Name' is NULL, which results in NA even if you join the strings. Please try `df_init[df_init['Screen&Action'].isnull()]` and confirm. – r-beginners Jan 27 '22 at 03:32
  • @r-beginners Hello. My apologies for the late reply. I updated this case with the results of that statement, and yes, I can see there are null values in the spreadsheet, which is something that wasn't supposed to happen. Thank you for taking the time to point this out. – Luis Jan 27 '22 at 16:32

1 Answers1

1
  • using sample 10 rows you provided. Replicated the error by setting SCREEN to nan for two rows
  • this is a data quality issue (I'm using python 3.9 and plotly 5.5.0)
  • two columns have this potential issue as you concatenate them together
  • have worked around by replacing nan with fillna() in target concatenated column
# df_init = pd.read_excel(absolutePathOfFile, engine="openpyxl")

# simulate a couple of missing values
df_init.loc[df_init.sample(2).index,"SCREEN"] = np.nan
df_init.loc[df_init.sample(2).index,"Action_Name"] = ""

df_init["Screen&Action"] = df_init["SCREEN"].str.cat(df_init["Action_Name"], sep=" - ")
##### PLOTING Scatter chart######
print("Plotting the scatter graphs")
# using plotly to create a scatter graph
try:
    fig_scatter = px.scatter(
        df_init,
        x="INSTANT",
        y="DURATION",
        color="Screen&Action",
        height=500,
        title="Traditional Screen Analysis",
    )
except KeyError:
    print("failed, defaulting")
    df_init["Screen&Action"] = df_init["Screen&Action"].fillna("unknown")
    fig_scatter = px.scatter(
        df_init,
        x="INSTANT",
        y="DURATION",
        color="Screen&Action",
        height=500,
        title="Traditional Screen Analysis",
    )

    
fig_scatter.update_layout(
    showlegend=False, title={"y": 0.9, "x": 0.5, "xanchor": "center", "yanchor": "top"}
)
fig_scatter.show()
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30
  • 1
    Hello. I confirm your code works. Thank you for investing some of your time to figure out the root cause for this case. I really appreciate it. – Luis Jan 27 '22 at 16:39