np.column_stack dimension mismatch sometimes

Question

if df is None:

    """Using np.column_stack for performance reasons"""
    df = pd.DataFrame(np.column_stack([paths, file_names, hashes, filesizes]), 
                               columns=['path', 'file name', 'sha256', 'file size (MB)']) 
 
    """Save the df just in case we want to continue later"""
    print(f"Saving progress to {pickle_path}")
    try:
        df.to_pickle(pickle_path)

I get this error on the line with np.column_stack.

Exception has occurred: ValueError
all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 156738 and the array at index 3 has size 156735

This line works most of the time, but every once in a while I get this error on a script that can have a runtime at almost a half hour, so it is pretty frustrating to get it. Is there some way I could fill NA or something to get the dimensions corrected?

one way or other you need to test the dimensions, especially the one that occasionally is short. Then either skip that case, or pad the array. `column_stack` can't take any corrective action for you. — hpaulj, Sep 12 '22 at 06:01

score 0 · Answer 1 · answered Sep 12 '22 at 04:13

0

Array at index 0 refers to paths.
Array at index 3 refers to filesizes.

Assert that the shapes of paths and filesizes are equal before using np.column_stack. There is a dimension mismatch, as you've noticed.

assert paths.shape == filesizes.shape

Fill the input array (paths or filesizes) with NA before feeding them into the Dataframe based on an acceptable criteria. Randomly filling the arrays with NAs is not recommended.

answered Sep 12 '22 at 04:13

Grace Mathew

73
5

Thanks, Grace. I'm thinking the mismatch in dimensions is due to not having permission for some files, and therefore there is nothing that I can think of to put in the column besides NA. I'll investigate possible solutions to this tomorrow. – jangles Sep 12 '22 at 04:18
@jangles Maybe this resource would be of interest to you [Check permissions for files in linux](https://stackoverflow.com/questions/1861836/checking-file-permissions-in-linux-with-python) – Grace Mathew Sep 12 '22 at 04:27

np.column_stack dimension mismatch sometimes

1 Answers1