0
if df is None:

    """Using np.column_stack for performance reasons"""
    df = pd.DataFrame(np.column_stack([paths, file_names, hashes, filesizes]), 
                               columns=['path', 'file name', 'sha256', 'file size (MB)']) 
 
    """Save the df just in case we want to continue later"""
    print(f"Saving progress to {pickle_path}")
    try:
        df.to_pickle(pickle_path)

I get this error on the line with np.column_stack.

Exception has occurred: ValueError
all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 156738 and the array at index 3 has size 156735

This line works most of the time, but every once in a while I get this error on a script that can have a runtime at almost a half hour, so it is pretty frustrating to get it. Is there some way I could fill NA or something to get the dimensions corrected?

jangles
  • 303
  • 1
  • 6
  • 22
  • one way or other you need to test the dimensions, especially the one that occasionally is short. Then either skip that case, or pad the array. `column_stack` can't take any corrective action for you. – hpaulj Sep 12 '22 at 06:01

1 Answers1

0

Array at index 0 refers to paths.
Array at index 3 refers to filesizes.

Assert that the shapes of paths and filesizes are equal before using np.column_stack. There is a dimension mismatch, as you've noticed.

assert paths.shape == filesizes.shape

Fill the input array (paths or filesizes) with NA before feeding them into the Dataframe based on an acceptable criteria. Randomly filling the arrays with NAs is not recommended.

  • Thanks, Grace. I'm thinking the mismatch in dimensions is due to not having permission for some files, and therefore there is nothing that I can think of to put in the column besides NA. I'll investigate possible solutions to this tomorrow. – jangles Sep 12 '22 at 04:18
  • @jangles Maybe this resource would be of interest to you [Check permissions for files in linux](https://stackoverflow.com/questions/1861836/checking-file-permissions-in-linux-with-python) – Grace Mathew Sep 12 '22 at 04:27