ML101 – pt. 09: Reducing our Data – the PCA Node

by Christopher Kopic

28.11.2024

comments 4

Part of: Machine Learning 101

Machine Learning 101, Premium Course

ai, analysis, coordinate systems, Coordinates, data, Dataset, Houdini, KI, machine learning, Muscle deformer, node, PCA, Premium Content, Principal Component Analysis, projection, Reduction, VEX, wrinkle

4 Comments

yancy
- 15.12.2024
- Reply
I did have a quick questions I am hoping you can answer?

Why does the pca node in analyze mode produce the following number of components when Number Of Components is set to 0 (fully represented):

Number Of Components = pow(Points Per Sample, 2)*(length of the Attributes)
- Christopher Kopic
  - 17.12.2024
  - Reply
  Hey Yancy, let’s take a look at the box example from the PCA SOP Exploration hip file with 8 points per sample and an attrib length of 3. In your formula it looks like this: pow(8,2)*3 = 192. Let’s rephrase that a bit: We want to build a new coordinate system with 8*3=24 axes. Since we want to express every axis as a high dimensional vector, each of our 24 axes also needs 24 dimensions. So our total number of floats is equal to pow(24,2)=576. Since we already have a P attrib that can store 3 floats, let’s distribute them over all P attribs so 576/3=192. And here’s your answer again.
  
  Let’s try that now with 8 points per sample and position and color: Now we have 8*(3+3) axes and 8*(3+3) dimensions per axis, so 2304 floats in total. But now we also have both a P and a CD attrib on each point that we can use to store those floats, so 2304/6 = 384 = pow(8,2)*6.
  
  So in the end the formula for the number of component axes is just “Points Per Sample * Length of Attribs”. The formula for the number of points needed is “pow(number of component axes, 2)/Length of Attribs” – So your formula, just less condensed.
大城
- 30.01.2025
- Reply
Hi Chris. Thank you for making this very useful video.
I have a question. What is the midpoint you refer to at the 18:41 in this video?
I thought the `Number of Components` was 128 because there are 128 poses from frame 1 to frame 128.
- Christopher Kopic
  - 14.02.2025
  - Reply
  Hey! At this point in the video, we’re dealing with very high dimensional data. If you take look at 04:00 in the video, you’ll get a better idea for this midpoint. We want to take the entries of our dataset (points in the first example, poses in the second) and place them in a new coordinate system, that pca should build for us. Our first coordinate system will have a maximum of 3 axes (1 point * 3 coordinates per point) + one midpoint, the origin point of our coordinate system. Our second coordinate system could have a maximum of 70.000 axes (30K points * 3 coordinates per pose) + again that origin point. But since we can lower the number of axes quite a bit without losing to much info, I chose 127 axes + one origin point, so in the end 128 values for our neural net.

Entagma

Advanced CG Resources

ML101 – pt. 09: Reducing our Data – the PCA Node

4 Comments

Leave a Reply Cancel reply